update README.md
Browse files
README.md
CHANGED
@@ -10,16 +10,19 @@ inference: false
|
|
10 |
# Description
|
11 |
A fine-tuned multi-class classification model that detects four different types of uncertainty cues (a.k.a hedges) on a token level.
|
12 |
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
|
|
|
|
|
|
18 |
|
19 |
# Intended uses and limitations
|
20 |
- The model was fine-tuned with the [Simple Transformers](https://simpletransformers.ai/) library. This library is based on Transformers but the model cannot be used directly with Transformers `pipeline` and classes; doing so would generate incorrect outputs. For this reason, the API on this page is disabled.
|
21 |
|
22 |
-
|
23 |
To generate predictions with the model, use the [Simple Transformers](https://simpletransformers.ai/) library:
|
24 |
```
|
25 |
from simpletransformers.ner import NERModel
|
@@ -58,13 +61,7 @@ In other words, the token 'perhaps' is recognized as an **epistemic uncertainty
|
|
58 |
# Training Data
|
59 |
HEDGEhog is trained and evaluated on the [Szeged Uncertainty Corpus](https://rgai.inf.u-szeged.hu/node/160) (Szarvas et al. 2012<sup>1</sup>). The original sentence-level XML version of this dataset is available [here](https://rgai.inf.u-szeged.hu/node/160).
|
60 |
|
61 |
-
The token-level version that was used for the training can be downloaded from [here](https://1drv.ms/u/s!AvPkt_QxBozXk7BiazucDqZkVxLo6g?e=IisuM6) in a form of pickled pandas DataFrame's. You can download either the split sets (```train.pkl``` 137MB, ```test.pkl``` 17MB, ```dev.pkl``` 17MB) or the full dataset (```szeged_fixed.pkl``` 172MB). Each row in the df contains a token, its features (these are not relevant for HEDGEhog; they were used to train the baseline CRF model, see [here](https://github.com/vanboefer/uncertainty_crf)), its sentence ID, and its label.
|
62 |
-
|
63 |
-
- E: epistemic
|
64 |
-
- I: investigation
|
65 |
-
- D: doxatic
|
66 |
-
- N: condition
|
67 |
-
- C: the token is **not** an uncertainty cue
|
68 |
|
69 |
# Training Procedure
|
70 |
The following training parameters were used:
|
|
|
10 |
# Description
|
11 |
A fine-tuned multi-class classification model that detects four different types of uncertainty cues (a.k.a hedges) on a token level.
|
12 |
|
13 |
+
# Uncertainty types
|
14 |
+
label | type | description | example
|
15 |
+
---| ---| ---| ---
|
16 |
+
E | Epistemic | The proposition is possible, but its truth-value cannot be decided at the moment. | She **may** be already asleep.
|
17 |
+
I | Investigation | The proposition is in the process of having its truth-value determined. | She **examined** the role of NF-kappaB in protein activation.
|
18 |
+
D | Doxatic | The proposition expresses beliefs and hypotheses, which may be known as true or false by others. | She **believes** that the Earth is flat.
|
19 |
+
N | Condition | The proposition is true or false based on the truth-value of another proposition. | **If** she gets the job, she will move to Utrecht.
|
20 |
+
C | *certain* | *n/a* | *n/a*
|
21 |
|
22 |
# Intended uses and limitations
|
23 |
- The model was fine-tuned with the [Simple Transformers](https://simpletransformers.ai/) library. This library is based on Transformers but the model cannot be used directly with Transformers `pipeline` and classes; doing so would generate incorrect outputs. For this reason, the API on this page is disabled.
|
24 |
|
25 |
+
# How to use
|
26 |
To generate predictions with the model, use the [Simple Transformers](https://simpletransformers.ai/) library:
|
27 |
```
|
28 |
from simpletransformers.ner import NERModel
|
|
|
61 |
# Training Data
|
62 |
HEDGEhog is trained and evaluated on the [Szeged Uncertainty Corpus](https://rgai.inf.u-szeged.hu/node/160) (Szarvas et al. 2012<sup>1</sup>). The original sentence-level XML version of this dataset is available [here](https://rgai.inf.u-szeged.hu/node/160).
|
63 |
|
64 |
+
The token-level version that was used for the training can be downloaded from [here](https://1drv.ms/u/s!AvPkt_QxBozXk7BiazucDqZkVxLo6g?e=IisuM6) in a form of pickled pandas DataFrame's. You can download either the split sets (```train.pkl``` 137MB, ```test.pkl``` 17MB, ```dev.pkl``` 17MB) or the full dataset (```szeged_fixed.pkl``` 172MB). Each row in the df contains a token, its features (these are not relevant for HEDGEhog; they were used to train the baseline CRF model, see [here](https://github.com/vanboefer/uncertainty_crf)), its sentence ID, and its label.
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
|
66 |
# Training Procedure
|
67 |
The following training parameters were used:
|