This means that the model will have at least 512 tokens for context when calculating the conditional likelihood of any one token (provided there are 512 preceding tokens available to condition on).