Looks like it was overfitting a lot.

https://wandb.ai/paul-stansifer/huggingface/runs/q00wn442?nw=nwuserpaulstansifer

Step	Training Loss	Validation Loss
10	1.972100	1.852683
20	1.838500	1.834271
30	1.847000	1.839722
40	1.820000	1.843212
50	1.831400	1.831898
60	1.872500	1.846977
70	1.887800	1.851411
80	1.860700	1.850182
90	1.911900	1.858088
100	1.904800	1.865184
110	1.849400	1.843305
120	1.859100	1.858299
130	1.918400	1.849470
140	1.848100	1.851458
150	1.863300	1.992230
160	2.021300	1.855798
170	1.909800	1.862519
180	1.899800	1.859040
190	1.869900	1.859638
200	1.925400	1.855649
210	1.889200	1.865452
220	1.884400	1.854306
230	1.932100	1.859568
240	1.904300	1.852759
250	1.895900	1.865029
260	1.877200	1.856257
270	1.892300	1.853710
280	1.939900	1.847225
290	1.859100	1.843517
300	1.828300	1.846446
310	1.916700	1.841959
320	1.882100	1.842111
330	1.876000	1.836245
340	1.890200	1.840429
350	1.905600	1.830070
360	1.914500	1.826384
370	1.856300	1.823096
380	1.860700	1.825345
390	1.880300	1.820421
400	1.719800	1.822181
410	1.230200	1.872961
420	1.317400	1.876906
430	1.368800	1.864715
440	1.402900	1.861143
450	1.338500	1.852673
460	1.300500	1.851561
470	1.340900	1.864635
480	1.352300	1.859155
490	1.325700	1.865095
500	1.396100	1.857414
510	1.312600	1.884443
520	1.371300	1.848179
530	1.372700	1.847898
540	1.414700	1.848458
550	1.319600	1.842025
560	1.393200	1.839077
570	1.324500	1.845762
580	1.347000	1.830255

Uploaded model

  • Developed by: paul-stansifer
  • License: apache-2.0
  • Finetuned from model : unsloth/mistral-7b-bnb-4bit

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
61
GGUF
Model size
7.24B params
Architecture
llama

4-bit

8-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for paul-stansifer/qw-mistral-1e-3-7b-overfit-gguf

Quantized
(126)
this model