The smaller the stride, the more context the model will have in making each prediction, and the better the reported perplexity will typically be.