metadata
license: mit
language:
- en
base_model:
- CrabInHoney/urlbert-tiny-base-v1
pipeline_tag: text-classification
tags:
- classification
- url
- urls
- phishing
new_version: CrabInHoney/urlbert-tiny-v2-phishing-classifier
This is a very small version of BERT, designed to categorize links into phishing and non-phishing links
Model size 6.53M params
Tensor type F32
Example:
from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Используемое устройство: {device}")
model_path = "./urlbert-tiny-v1-phishing-classifier"
tokenizer = BertTokenizerFast.from_pretrained(model_path)
model = BertForSequenceClassification.from_pretrained(model_path)
model.to(device)
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
device=0 if torch.cuda.is_available() else -1,
return_all_scores=True
)
test_urls = [
"en.wikipedia.org/wiki/",
"facebook-profile.km6.net"
]
for url in test_urls:
results = classifier(url)
print(f"\nURL: {url}")
for result in results[0]:
label = result['label']
score = result['score']
print(f"Класс: {label}, вероятность: {score:.4f}")
Output:
Используемое устройство: cuda
URL: en.wikipedia.org/wiki/
Класс: good, вероятность: 0.9995
Класс: phish, вероятность: 0.0005
URL: facebook-profile.km6.net
Класс: good, вероятность: 0.0012
Класс: phish, вероятность: 0.9988