Comparison with the LLM
Have you tried comparing the results with LLMs like llama3-70b? For the text in the example it does quite well:
Testes <person>Prandote de Sproua</person> contra <person>Sbisconem</person> :
<person>Sbisco de Olesnicza</person>, quod scit et testatur <person>Czescz</person>,
<person>Sbisco de Marczinczouicz</person> in testimonium, <person>Stanislaus Lantka</person> in
testimonium, <person>Wirzchoslaus</person> in testimonium, <person>Benik de Thopola</person>,
<person>Thomas de Tzsczenecz</person>.```
As far as I know (maybe
@novacellus
can prove me wrong on that one), there was no attempt to compare these models with any LLMs. The results were only compared with the ground truth provided by the PAN IJP.
Maybe it would be a good starting point for another study.
We intend to include this in the upcoming paper. The results are quite promising even with smaller models (local llama3:8B):Testes Prandote de <Person>Sproua</Person> contra <Person>Sbisconem</Person> : <Person>Sbisco de Olesnicza</Person>, quod scit et testatur <Person>Czescz</Person>, <Person>Sbisco de Marczinczouicz</Person> in testimonium, <Person>Stanislaus Lantka</Person> in testimonium, <Person>Wirzchoslaus</Person> in testimonium, <Person>Benik de Thopola</Person>, <Person>Thomas de Tzsczenecz</Person>