Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
438 Bytes
the model only calculates loss over trg_len - 1 labels, because it internally shifts the labels
# to the left by 1.
neg_log_likelihood = outputs.loss
nlls.append(neg_log_likelihood)
prev_end_loc = end_loc
if end_loc == seq_len:
break
ppl = torch.exp(torch.stack(nlls).mean())
Running this with the stride length equal to the max input length is equivalent to the suboptimal, non-sliding-window
strategy we discussed above.