File size: 367 Bytes
57bdca5 |
1 2 3 |
Even worse, if you are using torch.distributed to launch a distributed training, each process will load the pretrained model and store these two copies in RAM. Note that the randomly created model is initialized with "empty" tensors, which take the space in memory without filling it (thus the random values are whatever was in this chunk of memory at a given time). |