If we have a tokenized sequence \(X = (x_0, x_1, \dots, x_t)\), then the perplexity of \(X\) is, $$\text{PPL}(X) = \exp \left{ {-\frac{1}{t}\sum_i^t \log p_\theta (x_i|x_{