Model Card for Model ID
Detect sexual content in text or file names.
Model Details
Model Description
- Developed by: liu wei
- License: MIT
- Finetuned from model: bert-base-multilingual-cased
- Task: Simple Classification
- Language: Multilingual
- Max Length: 128
- Updated Time: 2024-8-22
Model Training Information
- Training Dataset Size: 100,000 manually annotated data with noise
- Data Distribution: 50:50
- Batch Size: 8
- Epochs: 5
- Accuracy: 92%
- F1: 92%
Uses
- Supports multiple languages, such as English, Chinese, Japanese, etc.
- Use for detect content submitted by users in forums, magnetic search engines, cloud disks, etc.
- Detect semantics and variant content, Porn movie numbers or variant file names.
- Compared with GPT4O-mini, The detection accuracy is greatly improved.
Examples
- Example English
predict("Tiffany Doll - Wine Makes Me Anal (31.03.2018)_1080p.mp4")
{
"predictions": 1,
"label": "Sexual"
}
- Example Chinese
predict("橙子 · 保安和女业主的一夜春宵。路见不平拔刀相助,救下苏姐,以身相许!")
{
"predictions": 1,
"label": "Sexual"
}
- Example Japanese
predict("MILK-217-UNCENSORED-LEAKピタコス Gカップ痴女 完全着衣で濃密5PLAY 椿りか 580 2.TS")
{
"predictions": 1,
"label": "Sexual"
}
- Example Porn Movie Numbers
predict("DVAJ-548_CH_SD")
{
"predictions": 1,
"label": "Sexual"
}
How to Get Started with the Model
step 1:
Create a python file under this model, such as 'use_model.py'
import torch
from transformers import BertForSequenceClassification, BertTokenizer
# load model
tokenizer = BertTokenizer.from_pretrained("uget/sexual_content_dection")
model = BertForSequenceClassification.from_pretrained("uget/sexual_content_dection")
def predict(text):
encoding = tokenizer(text, return_tensors="pt")
encoding = {k: v.to(model.device) for k,v in encoding.items()}
outputs = model(**encoding)
probs = torch.sigmoid(outputs.logits)
predictions = torch.argmax(probs, dim=-1)
label_map = {0: "None", 1: "Sexual"}
predicted_label = label_map[predictions.item()]
print(f"Predictions:{predictions.item()}, Label:{predicted_label}")
return {"predictions": predictions.item(), "label": predicted_label}
predict("Tiffany Doll - Wine Makes Me Anal (31.03.2018)_1080p.mp4")
step 2:
Run
python3 use_model.py
Response JSON
{
"predictions": 1,
"label": "Sexual"
}
Explanation
The results only include two situations:
- predictions-0 Not Dectection sexual content;
- predictions-1 Sexual content was detected.
Model Card Contact
Email: jack813@gmail.com
- Downloads last month
- 118
Model tree for uget/sexual_content_dection
Base model
google-bert/bert-base-multilingual-cased