The paper judges the effectiveness of this approach only through perplexity. The concept of perplexity is basically, "how perplexed (surprised) your language model is when predicting a token". If a language model generates words at random then perplexity will be very high. However, if the LM is confident about a small set of words to be generated then perplexity will be low. So adding a predefined fixed token after each token will obviously make the LM more confident about the next word. So obviously perplexity will be low. Isn't it?
Ritwik Mishra
ritwikm
AI & ML interests
None yet
Recent Activity
View all activity
Organizations
ritwikm's activity
I keep getting this: ImportError: Using `load_in_8bit=True` requires Accelerate
44
#11 opened over 1 year ago
by
Zelan
upvoted
a
paper
9 months ago
Adding `safetensors` variant of this model
#1 opened over 1 year ago
by
SFconvertbot