Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,56 @@
|
|
1 |
---
|
2 |
title: Negbleurt
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: gradio
|
7 |
sdk_version: 3.38.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
|
|
10 |
---
|
|
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
title: Negbleurt
|
3 |
+
emoji: 🌖
|
4 |
+
colorFrom: indigo
|
5 |
+
colorTo: indigo
|
6 |
sdk: gradio
|
7 |
sdk_version: 3.38.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
+
license: mit
|
11 |
---
|
12 |
+
# Metric Card for NegBLEURT
|
13 |
|
14 |
+
|
15 |
+
## Metric Description
|
16 |
+
|
17 |
+
NegBLEURT is the negation-aware version of the BLEURT metric. It can be used to evaluate generated text against a reference.
|
18 |
+
BLEURT a learnt evaluation metric for Natural Language Generation. It is built using multiple phases of transfer learning starting from a pretrained BERT model (Devlin et al. 2018) and then employing another pre-training phrase using synthetic data. Finally it is trained on WMT human annotations and the CANNOT negation awareness dataset.
|
19 |
+
|
20 |
+
## How to Use
|
21 |
+
|
22 |
+
At minimum, this metric requires predictions and references as inputs.
|
23 |
+
|
24 |
+
```python
|
25 |
+
>>> negBLEURT = evaluate.load('tum-nlp/negbleurt')
|
26 |
+
>>> predictions = ["Ray Charles is a legend.", "Ray Charles isn’t legendary."]
|
27 |
+
>>> references = ["Ray Charles is legendary.", "Ray Charles is legendary."]
|
28 |
+
>>> results = negBLEURT.compute(predictions=predictions, references=references)
|
29 |
+
>>> print(results)
|
30 |
+
{'negBLERUT': [0.8409, 0.2601]}
|
31 |
+
```
|
32 |
+
|
33 |
+
|
34 |
+
### Inputs
|
35 |
+
- **predictions: list of predictions to score. Each prediction should be a string.
|
36 |
+
- **references: list of references, one for each prediction. Each reference should be a string
|
37 |
+
- **batch_size (optional): batch_size for model inference. Default is 16
|
38 |
+
### Output Values
|
39 |
+
- **negBLEURT**(list of `float`): NegBLEURT scores. Values usually range between 0 and 1 where 1 indicates a perfect prediction and 0 indicates a poor fit.
|
40 |
+
Output Example(s):
|
41 |
+
```python
|
42 |
+
{'negBLERUT': [0.8409, 0.2601]}
|
43 |
+
```
|
44 |
+
This metric outputs a dictionary, containing the negBLEURT score.
|
45 |
+
|
46 |
+
|
47 |
+
## Limitations and Bias
|
48 |
+
This metric is based on BERT (Devlin et al. 2018) and as such inherits its biases and weaknesses. It was trained in an negation aware setting, and thus, overcomes BERT issues with negation awareness.
|
49 |
+
|
50 |
+
Currently, NegBLEURT is only available in English.
|
51 |
+
## Citation(s)
|
52 |
+
```bibtex
|
53 |
+
tba
|
54 |
+
```
|
55 |
+
## Further References
|
56 |
+
- The original [NegBLEURT GitHub repo](https://github.com/MiriUll/negation_aware_evaluation)
|