CrabInHoney's picture
Update README.md
7c21db2 verified
metadata
license: mit
language:
  - en
base_model:
  - CrabInHoney/urlbert-tiny-base-v1
pipeline_tag: text-classification
tags:
  - classification
  - url
  - urls
  - phishing
new_version: CrabInHoney/urlbert-tiny-v2-phishing-classifier

This is a very small version of BERT, designed to categorize links into phishing and non-phishing links

Model size 6.53M params

Tensor type F32

Dataset

Example:

from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Используемое устройство: {device}")

model_path = "./urlbert-tiny-v1-phishing-classifier"

tokenizer = BertTokenizerFast.from_pretrained(model_path)

model = BertForSequenceClassification.from_pretrained(model_path)
model.to(device)

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1,
    return_all_scores=True
)

test_urls = [
    "en.wikipedia.org/wiki/",
    "facebook-profile.km6.net"
]

for url in test_urls:
    results = classifier(url)
    print(f"\nURL: {url}")
    for result in results[0]:
        label = result['label']
        score = result['score']
        print(f"Класс: {label}, вероятность: {score:.4f}")
        

Output:

Используемое устройство: cuda

URL: en.wikipedia.org/wiki/

Класс: good, вероятность: 0.9995

Класс: phish, вероятность: 0.0005

URL: facebook-profile.km6.net

Класс: good, вероятность: 0.0012

Класс: phish, вероятность: 0.9988

License

MIT