cestwc commited on
Commit
dc99ac2
1 Parent(s): 5dd8485

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -10,4 +10,37 @@ widget:
10
  example_title: "Example 1"
11
  - text: "Incongruent Headlines: Yet Another Way to Mislead Your Readers </s> Emotion Cause Extraction - A Review of Various Methods and Corpora"
12
  example_title: "Example 2"
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  example_title: "Example 1"
11
  - text: "Incongruent Headlines: Yet Another Way to Mislead Your Readers </s> Emotion Cause Extraction - A Review of Various Methods and Corpora"
12
  example_title: "Example 2"
13
+ ---
14
+
15
+ # Bibtex classification using RoBERTa
16
+
17
+ ## Model Description
18
+ This model is a text classification tool designed to predict the likelihood of a given context paper being cited by a query paper. It processes concatenated titles of context and query papers and outputs a binary prediction: `1` indicates a potential citation relationship (though not necessary), and `0` suggests no such relationship.
19
+
20
+ ### Intended Use
21
+ - **Primary Use**: To extract a subset of bibtex from ACL Anthology to make it < 50 MB.
22
+
23
+ ### Model Training
24
+ - **Data Description**: The model was trained on a ACL Anthology dataset [cestwc/anthology](https://huggingface.co/datasets/cestwc/anthology) comprising pairs of paper titles. Each pair was annotated to indicate whether the context paper could potentially be cited by the query paper.
25
+
26
+ ### Performance
27
+ - **Metrics**: [Include performance metrics like accuracy, precision, recall, F1-score, etc.]
28
+
29
+ ## How to Use
30
+ ```python
31
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
32
+
33
+ model_name = "cestwc/roberta-base-bib"
34
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
35
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
36
+
37
+ def predict_citation(context_title, query_title):
38
+ inputs = tokenizer.encode_plus(f"{context_title} </s> {query_title}", return_tensors="pt")
39
+ outputs = model(**inputs)
40
+ prediction = outputs.logits.argmax(-1).item()
41
+ return "include" if prediction == 1 else "not include"
42
+
43
+ # Example
44
+ context_title = "Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples"
45
+ query_title = "Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility"
46
+ print(predict_citation(context_title, query_title))