VictorYeste's picture
Update README.md
a95cc54 verified
|
raw
history blame
5.25 kB
metadata
license: apache-2.0
base_model: microsoft/deberta-base
tags:
  - deberta
  - human value detection
  - text classification
  - multi-label clasification
model-index:
  - name: deberta-based-human-value-detection
    results: []

Description

The Human Value Detection at CLEF 2024 task consists of two sub-tasks: the first is to detect the presence or absence of each of these 19 values, while the second is to detect whether the value is attained or constrained.

Our system introduces a cascade model approach for the detection and stance classification of the predefined set of human values. It consists of two subsystems: one for detecting the presence of each human value and another for establishing the stance (if the sentence attains or constrains) of each human value. Each subsystem is designed and fine-tuned separately using a DeBERTa model as base.

  • Subsystem 1: Its primary function is to identify the presence of human values within sentences. By combining the 'attained' and 'constrained' labels to indicate an overall presence, it streamlines the multi-label classification task, simplifying it to a binary classification for each of the 19 human values (presence vs. absence).
  • Subsystem 2: it receives the outputs of subsystem 1 and classifies the stance towards each present human value in a binary classification (attained vs. constrained). This system transforms the sentences dataset into premise-hypothesis pairs, where each sentence is the premise, a value is the hypothesis, and the “attained” and “constrained” labels are the stance.

Given that subsystem 1 focuses on detecting the presence of human values in the text, and subsystem 2 focuses on the stances towards each detected human value, this cascade model approach improves the granularity of text classification.

This model is the responsible of the Subsystem 1 and accomplishes the first sub-task.

How to use

You can use this model using a text classification pipeline, as in the example:

from transformers import pipeline

model = "VictorYeste/deberta-based-human-value-detection"
tokenizer = "VictorYeste/deberta-based-human-value-detection"

values_detection = pipeline("text-classification", model=model, tokenizer=tokenizer, top_k=None)

values_detection("We would like to share this model with the research community.")

This returns the following:

[[{'label': 'Self-direction: thought', 'score': 0.02448045276105404},
  {'label': 'Stimulation', 'score': 0.01451807003468275},
  {'label': 'Universalism: concern', 'score': 0.006046739872545004},
  {'label': 'Self-direction: action', 'score': 0.004837467335164547},
  {'label': 'Benevolence: dependability', 'score': 0.001295178197324276},
  {'label': 'Benevolence: caring', 'score': 0.0009907316416501999},
  {'label': 'Conformity: interpersonal', 'score': 0.0004476217145565897},
  {'label': 'Security: societal', 'score': 0.00039295252645388246},
  {'label': 'Universalism: tolerance', 'score': 0.0003538706514518708},
  {'label': 'Power: dominance', 'score': 0.00016191638133022934},
  {'label': 'Power: resources', 'score': 0.0001522471575299278},
  {'label': 'Universalism: nature', 'score': 0.00014803129306528717},
  {'label': 'Humility', 'score': 0.0001100009903893806},
  {'label': 'Face', 'score': 9.083452459890395e-05},
  {'label': 'Conformity: rules', 'score': 8.524076838511974e-05},
  {'label': 'Achievement', 'score': 6.411433423636481e-05},
  {'label': 'Security: personal', 'score': 5.183048051549122e-05},
  {'label': 'Hedonism', 'score': 3.167059549014084e-05},
  {'label': 'Tradition', 'score': 2.4977327484521084e-05}]]

The model has been trained as a multi-label problem, so it can also be used to predict multiple labels as follows:

import torch
import numpy as np
import transformers

def multilabel_pipeline(text, model, tokenizer, id2label):
    # Code adapted from: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/BERT/Fine_tuning_BERT_(and_friends)_for_multi_label_text_classification.ipynb
    """ Predicts the value probabilities (attained and constrained) for each sentence """
    encoding = tokenizer(text, return_tensors="pt")
    encoding = {k: v for k,v in encoding.items()}
    outputs = model(**encoding)
    logits = outputs.logits
    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(logits.squeeze().cpu())
    predictions = np.zeros(probs.shape)
    predictions[np.where(probs >= 0.5)] = 1
    predicted_labels = [id2label[idx] for idx, label in enumerate(predictions) if label == 1.0]
    return predicted_labels
     
values = ["Self-direction: thought", "Self-direction: action", "Stimulation",  "Hedonism", "Achievement", "Power: dominance", "Power: resources", "Face", "Security: personal", "Security: societal", "Tradition", "Conformity: rules", "Conformity: interpersonal", "Humility", "Benevolence: caring", "Benevolence: dependability", "Universalism: concern", "Universalism: nature", "Universalism: tolerance" ]
id2label = {idx:label for idx, label in enumerate(values)}
model_name = "VictorYeste/deberta-based-human-value-detection"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
model = transformers.AutoModelForSequenceClassification.from_pretrained(model_name)