Feedback on German Language Interview Questions: Issue with Punctuation

#11
by itisdom - opened

Hello,

I've been using this model for correcting interview transcriptions in German and I've noticed a peculiar pattern that I wanted to share with the community and the developers. It seems that the model exhibits a strong bias towards using periods (full stops) instead of question marks at the end of questions. This happens quite frequently, leading to a slight misrepresentation in the tone and clarity of the questions being asked.

Unfortunately, I am unable to share specific examples as they are not suitable for publication.

I understand that language processing, especially in multilingual contexts, can be quite challenging. However, I believe addressing this particular issue could significantly improve the model's performance for German language interviews.

I thought it was important to bring this to your attention. I appreciate all the hard work that went into this model as It improved and complemented the transcription process of interviews a lot!

Thank you for considering this feedback.

Best regards.

Hi @itisdom ,
thank you for taking the time to write this feedback. The behavior that you observed is plausible, since question marks are the least frequent punctuation markers in the training dataset.
Here is the actual distribution from the paper:
image.png

A solution could be, to create a new dataset that contains more questions.

Best,
Oliver

Sign up or log in to comment