UsernameJustAnother
/

Nemo-12B-Marlin-v6

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

UsernameJustAnother commited on Aug 17

Commit

d4d67c1

•

1 Parent(s): 3e6f1b0

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -29,6 +29,8 @@ New for v6:
 - Different learning rate and back to Celeste's scaling factor setup (but Celeste trained on -base, this is -instruct).
 - Now with added eval! I worked out how to get eval stats (and wandb) set up, so now I can see my failures in graphical form.
 And of course yay Unsloth for letting this all train on a single A100 with variable (wildly variable) context length.
 Here's what the train/eval loss looked like (eval is orange, train is blue). I think that's not terrible, but :shrug:.

 - Different learning rate and back to Celeste's scaling factor setup (but Celeste trained on -base, this is -instruct).
 - Now with added eval! I worked out how to get eval stats (and wandb) set up, so now I can see my failures in graphical form.
+I pulled v7 because I honestly don't think it's as good as v6, and don't want folks to get the wrong idea that it's better just because the version number is higher.
 And of course yay Unsloth for letting this all train on a single A100 with variable (wildly variable) context length.
 Here's what the train/eval loss looked like (eval is orange, train is blue). I think that's not terrible, but :shrug:.