metadata
license: mit
language:
- en
library_name: transformers
pipeline_tag: token-classification
tags:
- Social Bias
metrics:
- name: F1
type: F1
value: 0.7864
- name: Recall
type: Recall
value: 0.7617
thumbnail: >-
https://media.licdn.com/dms/image/v2/D4E12AQH-g6TfVlad0g/article-cover_image-shrink_720_1280/article-cover_image-shrink_720_1280/0/1724391684857?e=1729728000&v=beta&t=e3ggmXGVKaVU6e72wjsc9Ppgd0rigQqjeA1Od9fyFDk
base_model: bert-base-uncased
co2_eq_emissions:
emissions: 8
training_type: fine-tuning
geographical_location: Phoenix, AZ
hardware_used: T4
Social Bias NER
This NER model is fine-tuned from BERT, for multi-label token classification of:
- (GEN)eralizations
- (UNFAIR)ness
- (STEREO)types
You can try it out in spaces :).
How to Get Started with the Model
Transformers pipeline doesn't have a class for multi-label token classification, but you can use this code to load the model, and run it, and format the output.
import json
import torch
from transformers import BertTokenizerFast, BertForTokenClassification
import gradio as gr
# init important things
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('maximuspowers/bias-detection-ner')
model.eval()
model.to('cuda' if torch.cuda.is_available() else 'cpu')
# ids to labels we want to display
id2label = {
0: 'O',
1: 'B-STEREO',
2: 'I-STEREO',
3: 'B-GEN',
4: 'I-GEN',
5: 'B-UNFAIR',
6: 'I-UNFAIR'
}
# predict function you'll want to use if using in your own code
def predict_ner_tags(sentence):
inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True, max_length=128)
input_ids = inputs['input_ids'].to(model.device)
attention_mask = inputs['attention_mask'].to(model.device)
with torch.no_grad():
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
logits = outputs.logits
probabilities = torch.sigmoid(logits)
predicted_labels = (probabilities > 0.5).int() # remember to try your own threshold
result = []
tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
for i, token in enumerate(tokens):
if token not in tokenizer.all_special_tokens:
label_indices = (predicted_labels[0][i] == 1).nonzero(as_tuple=False).squeeze(-1)
labels = [id2label[idx.item()] for idx in label_indices] if label_indices.numel() > 0 else ['O']
result.append({"token": token, "labels": labels})
return json.dumps(result, indent=4)