Ahmadzei's picture
update 1
57bdca5
raw
history blame contribute delete
641 Bytes
Mismatches may cause the training to fail in very difficult to detect ways!
Some configuration parameters specific to DeepSpeed only which need to be manually set based on your training needs.
You could also modify the DeepSpeed configuration and edit [TrainingArguments] from it:
Create or load a DeepSpeed configuration to used as the main configuration
Create a [TrainingArguments] object based on these DeepSpeed configuration values
Some values, such as scheduler.params.total_num_steps are calculated by the [Trainer] during training.
ZeRO configuration
There are three configurations, each corresponding to a different ZeRO stage.