Spaces:

Ahmadzei
/

RAG

Runtime error

update 1

57bdca5 9 months ago

1.13 kB

	In the second benchmark we use NCCL_P2P_DISABLE=1 to tell the GPUs not to use NVLink.
	Here is the full benchmark code and outputs:
	```bash
	DDP w/ NVLink
	rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 torchrun \
	--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path openai-community/gpt2 \
	--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train \
	--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
	{'train_runtime': 101.9003, 'train_samples_per_second': 1.963, 'epoch': 0.69}
	DDP w/o NVLink
	rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 NCCL_P2P_DISABLE=1 torchrun \
	--nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py --model_name_or_path openai-community/gpt2 \
	--dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train
	--output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
	{'train_runtime': 131.4367, 'train_samples_per_second': 1.522, 'epoch': 0.69}

	Hardware: 2x TITAN RTX 24GB each + NVlink with 2 NVLinks (NV2 in nvidia-smi topo -m)
	Software: pytorch-1.8-to-be + cuda-11.0 / transformers==4.3.0.dev0.