bigscience-bot
commited on
Commit
·
0a0a7d2
1
Parent(s):
511249d
new data
Browse files- logs/main_log.txt +66 -0
logs/main_log.txt
CHANGED
@@ -96857,3 +96857,69 @@ time (ms)
|
|
96857 |
time (ms)
|
96858 |
iteration 1965/ 292968 | consumed samples: 4024320 | consumed tokens: 459767808 | elapsed time per iteration (ms): 106873.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.921288E+00 | loss scale: 32768.0 | grad norm: 20923.170 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96859 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96857 |
time (ms)
|
96858 |
iteration 1965/ 292968 | consumed samples: 4024320 | consumed tokens: 459767808 | elapsed time per iteration (ms): 106873.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.921288E+00 | loss scale: 32768.0 | grad norm: 20923.170 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96859 |
time (ms)
|
96860 |
+
iteration 1966/ 292968 | consumed samples: 4026368 | consumed tokens: 460111872 | elapsed time per iteration (ms): 103668.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.929842E+00 | loss scale: 32768.0 | grad norm: 19834.126 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96861 |
+
time (ms)
|
96862 |
+
iteration 1967/ 292968 | consumed samples: 4028416 | consumed tokens: 460455936 | elapsed time per iteration (ms): 106664.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.931633E+00 | loss scale: 32768.0 | grad norm: 19386.027 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96863 |
+
time (ms)
|
96864 |
+
iteration 1968/ 292968 | consumed samples: 4030464 | consumed tokens: 460800000 | elapsed time per iteration (ms): 110508.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.945953E+00 | loss scale: 32768.0 | grad norm: 19908.571 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96865 |
+
time (ms)
|
96866 |
+
iteration 1969/ 292968 | consumed samples: 4032512 | consumed tokens: 461144064 | elapsed time per iteration (ms): 110069.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.896821E+00 | loss scale: 32768.0 | grad norm: 15035.351 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96867 |
+
time (ms)
|
96868 |
+
iteration 1970/ 292968 | consumed samples: 4034560 | consumed tokens: 461488128 | elapsed time per iteration (ms): 107170.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.940769E+00 | loss scale: 32768.0 | grad norm: 13950.627 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96869 |
+
time (ms)
|
96870 |
+
iteration 1971/ 292968 | consumed samples: 4036608 | consumed tokens: 461832192 | elapsed time per iteration (ms): 106511.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.931390E+00 | loss scale: 32768.0 | grad norm: 19245.494 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96871 |
+
time (ms)
|
96872 |
+
iteration 1972/ 292968 | consumed samples: 4038656 | consumed tokens: 462176256 | elapsed time per iteration (ms): 104143.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.939216E+00 | loss scale: 32768.0 | grad norm: 23053.813 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96873 |
+
time (ms)
|
96874 |
+
iteration 1973/ 292968 | consumed samples: 4040704 | consumed tokens: 462520320 | elapsed time per iteration (ms): 106138.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.959975E+00 | loss scale: 32768.0 | grad norm: 22524.458 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96875 |
+
time (ms)
|
96876 |
+
iteration 1974/ 292968 | consumed samples: 4042752 | consumed tokens: 462864384 | elapsed time per iteration (ms): 105586.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.905755E+00 | loss scale: 32768.0 | grad norm: 19440.251 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96877 |
+
time (ms)
|
96878 |
+
iteration 1975/ 292968 | consumed samples: 4044800 | consumed tokens: 463208448 | elapsed time per iteration (ms): 106158.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.915691E+00 | loss scale: 32768.0 | grad norm: 17649.388 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96879 |
+
time (ms)
|
96880 |
+
iteration 1976/ 292968 | consumed samples: 4046848 | consumed tokens: 463552512 | elapsed time per iteration (ms): 106708.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.920288E+00 | loss scale: 32768.0 | grad norm: 20503.069 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96881 |
+
time (ms)
|
96882 |
+
iteration 1977/ 292968 | consumed samples: 4048896 | consumed tokens: 463896576 | elapsed time per iteration (ms): 105936.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.945108E+00 | loss scale: 32768.0 | grad norm: 16839.813 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96883 |
+
time (ms)
|
96884 |
+
iteration 1978/ 292968 | consumed samples: 4050944 | consumed tokens: 464240640 | elapsed time per iteration (ms): 105458.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.917942E+00 | loss scale: 32768.0 | grad norm: 15257.276 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96885 |
+
time (ms)
|
96886 |
+
iteration 1979/ 292968 | consumed samples: 4052992 | consumed tokens: 464584704 | elapsed time per iteration (ms): 107165.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.927221E+00 | loss scale: 32768.0 | grad norm: 15093.813 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96887 |
+
time (ms)
|
96888 |
+
iteration 1980/ 292968 | consumed samples: 4055040 | consumed tokens: 464928768 | elapsed time per iteration (ms): 113081.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.957678E+00 | loss scale: 32768.0 | grad norm: 13839.536 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96889 |
+
time (ms)
|
96890 |
+
iteration 1981/ 292968 | consumed samples: 4057088 | consumed tokens: 465272832 | elapsed time per iteration (ms): 108714.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.917398E+00 | loss scale: 32768.0 | grad norm: 14074.082 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96891 |
+
time (ms)
|
96892 |
+
iteration 1982/ 292968 | consumed samples: 4059136 | consumed tokens: 465616896 | elapsed time per iteration (ms): 107604.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.925085E+00 | loss scale: 32768.0 | grad norm: 13534.880 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96893 |
+
time (ms)
|
96894 |
+
iteration 1983/ 292968 | consumed samples: 4061184 | consumed tokens: 465960960 | elapsed time per iteration (ms): 112383.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.944923E+00 | loss scale: 32768.0 | grad norm: 13209.445 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96895 |
+
time (ms)
|
96896 |
+
iteration 1984/ 292968 | consumed samples: 4063232 | consumed tokens: 466305024 | elapsed time per iteration (ms): 112954.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.918631E+00 | loss scale: 32768.0 | grad norm: 19787.184 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96897 |
+
time (ms)
|
96898 |
+
iteration 1985/ 292968 | consumed samples: 4065280 | consumed tokens: 466649088 | elapsed time per iteration (ms): 111797.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.935518E+00 | loss scale: 32768.0 | grad norm: 17837.294 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96899 |
+
time (ms)
|
96900 |
+
iteration 1986/ 292968 | consumed samples: 4067328 | consumed tokens: 466993152 | elapsed time per iteration (ms): 110679.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.927701E+00 | loss scale: 32768.0 | grad norm: 24145.327 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96901 |
+
time (ms)
|
96902 |
+
iteration 1987/ 292968 | consumed samples: 4069376 | consumed tokens: 467337216 | elapsed time per iteration (ms): 106586.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.924149E+00 | loss scale: 32768.0 | grad norm: 19059.242 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96903 |
+
time (ms)
|
96904 |
+
iteration 1988/ 292968 | consumed samples: 4071424 | consumed tokens: 467681280 | elapsed time per iteration (ms): 104497.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.911625E+00 | loss scale: 32768.0 | grad norm: 15092.949 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96905 |
+
time (ms)
|
96906 |
+
iteration 1989/ 292968 | consumed samples: 4073472 | consumed tokens: 468025344 | elapsed time per iteration (ms): 104962.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.930661E+00 | loss scale: 32768.0 | grad norm: 19898.790 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96907 |
+
time (ms)
|
96908 |
+
iteration 1990/ 292968 | consumed samples: 4075520 | consumed tokens: 468369408 | elapsed time per iteration (ms): 104607.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.931398E+00 | loss scale: 32768.0 | grad norm: 18910.425 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96909 |
+
time (ms)
|
96910 |
+
iteration 1991/ 292968 | consumed samples: 4077568 | consumed tokens: 468713472 | elapsed time per iteration (ms): 103902.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.927662E+00 | loss scale: 32768.0 | grad norm: 16632.425 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96911 |
+
time (ms)
|
96912 |
+
iteration 1992/ 292968 | consumed samples: 4079616 | consumed tokens: 469057536 | elapsed time per iteration (ms): 106519.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.915715E+00 | loss scale: 32768.0 | grad norm: 13302.984 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96913 |
+
time (ms)
|
96914 |
+
iteration 1993/ 292968 | consumed samples: 4081664 | consumed tokens: 469401600 | elapsed time per iteration (ms): 105643.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.921783E+00 | loss scale: 32768.0 | grad norm: 16160.708 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96915 |
+
time (ms)
|
96916 |
+
iteration 1994/ 292968 | consumed samples: 4083712 | consumed tokens: 469745664 | elapsed time per iteration (ms): 104271.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.939743E+00 | loss scale: 32768.0 | grad norm: 19586.680 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96917 |
+
time (ms)
|
96918 |
+
iteration 1995/ 292968 | consumed samples: 4085760 | consumed tokens: 470089728 | elapsed time per iteration (ms): 105935.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.918940E+00 | loss scale: 32768.0 | grad norm: 18793.983 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96919 |
+
time (ms)
|
96920 |
+
iteration 1996/ 292968 | consumed samples: 4087808 | consumed tokens: 470433792 | elapsed time per iteration (ms): 105026.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.930414E+00 | loss scale: 32768.0 | grad norm: 16737.588 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96921 |
+
time (ms)
|
96922 |
+
iteration 1997/ 292968 | consumed samples: 4089856 | consumed tokens: 470777856 | elapsed time per iteration (ms): 104382.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.952893E+00 | loss scale: 32768.0 | grad norm: 13563.057 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96923 |
+
time (ms)
|
96924 |
+
iteration 1998/ 292968 | consumed samples: 4091904 | consumed tokens: 471121920 | elapsed time per iteration (ms): 106021.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.901303E+00 | loss scale: 32768.0 | grad norm: 15104.265 | num zeros: 0.0 | curriculum seqlen: 168 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
96925 |
+
time (ms)
|