Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame
145 Bytes
When offload_optimizer is enabled, you could use a non-DeepSpeed optimizer (except for LAMB) as long as it has both a CPU and GPU implementation.