File size: 716 Bytes
5fa1a76
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
Sharding strategy
FSDP offers a number of sharding strategies to select from:

FULL_SHARD - shards model parameters, gradients and optimizer states across workers; select 1 for this option
SHARD_GRAD_OP- shard gradients and optimizer states across workers; select 2 for this option
NO_SHARD - don't shard anything (this is equivalent to DDP); select 3 for this option
HYBRID_SHARD - shard model parameters, gradients and optimizer states within each worker where each worker also has a full copy; select 4 for this option
HYBRID_SHARD_ZERO2 - shard gradients and optimizer states within each worker where each worker also has a full copy; select 5 for this option

This is enabled by the fsdp_sharding_strategy flag.