YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Train-Test Set:
- https://github.com/L2-Regulasyon/Teknofest2023/blob/main/data/raw/teknofest_train_final.csv
- https://github.com/L2-Regulasyon/Teknofest2023/blob/main/data/external/tweetset.csv
Model: "dbmdz/bert-base-turkish-128k-uncased"
Önişleme
- Karakterler küçültülmüştür
- Noktalama işaretleri silinmiştir
- Ek ofansif olmayan veri kullanılmıştır
- Ofansif olmayan cümlelerin uzunlukları ofansif olanlara uygun şekilde kırpılmıştır
Tokenizer Parametreleri
max_length=64
padding=True
truncation=True
Eğitim Parametreleri
- Epoch: 3
- Learning Rate: 7e-5
- Batch-Size: 64
- Tokenizer Length: 64
- Loss: BCE
- Online Hard Example Mining: Açık
- Class-Weighting: Açık (^0.3)
- Early Stopping: Kapalı
- Stratified Batch Sampling: Açık
- Gradient Accumulation: Kapalı
- LR Scheduler: Cosine-with-Warmup
- Warmup Ratio: 0.1
- Weight Decay: 0.01
- LLRD: 0.95
- Label Smoothing: 0.05
- Gradient Clipping: 1.0
- MLM Pre-Training: Kapalı
CV10 Sonuçları
precision recall f1-score support
INSULT 0.8940 0.8918 0.8929 2393
OTHER 0.9319 0.9079 0.9197 3528
PROFANITY 0.9626 0.9533 0.9579 2376
RACIST 0.9317 0.9666 0.9488 2033
SEXIST 0.9388 0.9587 0.9486 2081
accuracy 0.9316 12411
macro avg 0.9318 0.9356 0.9336 12411
weighted avg 0.9316 0.9316 0.9315 12411
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.