T | |
Tensor Parallelism (TP) | |
Parallelism technique for training on multiple GPUs in which each tensor is split up into multiple chunks, so instead of | |
having the whole tensor reside on a single GPU, each shard of the tensor resides on its designated GPU. |