Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,7 @@ We also have other three MiniCheck model variants:
|
|
33 |
</p>
|
34 |
|
35 |
The performance of these models is evaluated on our new collected benchmark (unseen by our models during training), [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact),
|
36 |
-
from
|
37 |
exisiting specialized fact-checkers with a similar scale by a large margin. See full results in our work.
|
38 |
|
39 |
Note: We only evaluated the performance of our models on real claims -- without any human intervention in
|
@@ -73,7 +73,7 @@ from minicheck.minicheck import MiniCheck
|
|
73 |
import os
|
74 |
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
|
75 |
|
76 |
-
# load
|
77 |
df = pd.DataFrame(load_dataset("lytang/LLM-AggreFact")['test'])
|
78 |
docs = df.doc.values
|
79 |
claims = df.claim.values
|
|
|
33 |
</p>
|
34 |
|
35 |
The performance of these models is evaluated on our new collected benchmark (unseen by our models during training), [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact),
|
36 |
+
from 11 recent human annotated datasets on fact-checking and grounding LLM generations. MiniCheck-RoBERTa-Large outperform all
|
37 |
exisiting specialized fact-checkers with a similar scale by a large margin. See full results in our work.
|
38 |
|
39 |
Note: We only evaluated the performance of our models on real claims -- without any human intervention in
|
|
|
73 |
import os
|
74 |
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
|
75 |
|
76 |
+
# load 29K test data
|
77 |
df = pd.DataFrame(load_dataset("lytang/LLM-AggreFact")['test'])
|
78 |
docs = df.doc.values
|
79 |
claims = df.claim.values
|