Checkpoint "step115000-tokens482B" identical to main model?
#6
by
amodaresi
- opened
Also I have noticed that the nitro model is also identical to the "step651581-tokens2731B" checkpoint.
What exactly is the nitro revision? Is it the model before it's further tuned with learning rate annealing, as described in the paper?