File size: 716 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 9 10 |
Sharding strategy FSDP offers a number of sharding strategies to select from: FULL_SHARD - shards model parameters, gradients and optimizer states across workers; select 1 for this option SHARD_GRAD_OP- shard gradients and optimizer states across workers; select 2 for this option NO_SHARD - don't shard anything (this is equivalent to DDP); select 3 for this option HYBRID_SHARD - shard model parameters, gradients and optimizer states within each worker where each worker also has a full copy; select 4 for this option HYBRID_SHARD_ZERO2 - shard gradients and optimizer states within each worker where each worker also has a full copy; select 5 for this option This is enabled by the fsdp_sharding_strategy flag. |