Finetuning setup
#4
by
Andriy
- opened
Hi! We are trying to repeat your experiment. I see that you finetuned this model on 8xH100 with 4k-token inputs. What setup allowed you to fit both the model, the gradients, and the inputs into 8x80BG? Is it DeepSpeed or something else? Please share some details. Thanks!