File size: 488 Bytes
1cba78b |
1 2 3 4 5 6 7 8 9 10 |
INFO:__main__: Optimizer = adafactor INFO:__main__: Learning rate (peak) = 0.009 INFO:__main__: Num examples = 94558172 INFO:__main__: Num tokenized group examples 109037136 INFO:__main__: Num Epochs = 1 INFO:__main__: Instantaneous batch size per device = 4 INFO:__main__: Total train batch size (w. parallel & grad accum) = 512 INFO:__main__: Steps per epoch = 212963 (x grad accum (16) = 3407408) INFO:__main__: Total optimization steps = 212963 (x grad accum (16) = 3407408) |