Update README.md
Browse files
README.md
CHANGED
@@ -10,4 +10,37 @@ widget:
|
|
10 |
example_title: "Example 1"
|
11 |
- text: "Incongruent Headlines: Yet Another Way to Mislead Your Readers </s> Emotion Cause Extraction - A Review of Various Methods and Corpora"
|
12 |
example_title: "Example 2"
|
13 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
example_title: "Example 1"
|
11 |
- text: "Incongruent Headlines: Yet Another Way to Mislead Your Readers </s> Emotion Cause Extraction - A Review of Various Methods and Corpora"
|
12 |
example_title: "Example 2"
|
13 |
+
---
|
14 |
+
|
15 |
+
# Bibtex classification using RoBERTa
|
16 |
+
|
17 |
+
## Model Description
|
18 |
+
This model is a text classification tool designed to predict the likelihood of a given context paper being cited by a query paper. It processes concatenated titles of context and query papers and outputs a binary prediction: `1` indicates a potential citation relationship (though not necessary), and `0` suggests no such relationship.
|
19 |
+
|
20 |
+
### Intended Use
|
21 |
+
- **Primary Use**: To extract a subset of bibtex from ACL Anthology to make it < 50 MB.
|
22 |
+
|
23 |
+
### Model Training
|
24 |
+
- **Data Description**: The model was trained on a ACL Anthology dataset [cestwc/anthology](https://huggingface.co/datasets/cestwc/anthology) comprising pairs of paper titles. Each pair was annotated to indicate whether the context paper could potentially be cited by the query paper.
|
25 |
+
|
26 |
+
### Performance
|
27 |
+
- **Metrics**: [Include performance metrics like accuracy, precision, recall, F1-score, etc.]
|
28 |
+
|
29 |
+
## How to Use
|
30 |
+
```python
|
31 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
32 |
+
|
33 |
+
model_name = "cestwc/roberta-base-bib"
|
34 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
35 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
36 |
+
|
37 |
+
def predict_citation(context_title, query_title):
|
38 |
+
inputs = tokenizer.encode_plus(f"{context_title} </s> {query_title}", return_tensors="pt")
|
39 |
+
outputs = model(**inputs)
|
40 |
+
prediction = outputs.logits.argmax(-1).item()
|
41 |
+
return "include" if prediction == 1 else "not include"
|
42 |
+
|
43 |
+
# Example
|
44 |
+
context_title = "Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples"
|
45 |
+
query_title = "Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility"
|
46 |
+
print(predict_citation(context_title, query_title))
|