"Step4: Build ONNX Runtime from source" not working

#2
by kartikpodugu - opened

OS - Windows 11
Trying with WSL 2
GPU - NVIDIA RTX 3060

Followed the steps as is mentioned on model card page.
For step4, since I have a RTX 3060, i used CMAKE_CUDA_ARCHITECTURES=86, which I found from NVIDIA documentation. Also, the comments below say for 3090 also it is 86, So I am assuming it is correct for 3060 also.

Unable to compile from source, facing following issue.

nvcc error : 'cicc' died due to signal 9 (Kill signal)
gmake[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/build.make:5779: CMakeFiles/onnxruntime_providers_cuda.dir/workspace/onnxruntime/contrib_
ops/cuda/bert/flash_attention/flash_fwd_hdim128_fp16_sm80.cu.o] Error 9
gmake[2]: *** Waiting for unfinished jobs....
nvcc error : 'cicc' died due to signal 9 (Kill signal)
gmake[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/build.make:5794: CMakeFiles/onnxruntime_providers_cuda.dir/workspace/onnxruntime/contrib_
ops/cuda/bert/flash_attention/flash_fwd_hdim160_bf16_sm80.cu.o] Error 9
nvcc error : 'cicc' died due to signal 9 (Kill signal)
gmake[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/build.make:5809: CMakeFiles/onnxruntime_providers_cuda.dir/workspace/onnxruntime/contrib_
ops/cuda/bert/flash_attention/flash_fwd_hdim160_fp16_sm80.cu.o] Error 9
nvcc error : 'cicc' died due to signal 9 (Kill signal)
gmake[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/build.make:5764: CMakeFiles/onnxruntime_providers_cuda.dir/workspace/onnxruntime/contrib_
ops/cuda/bert/flash_attention/flash_fwd_hdim128_bf16_sm80.cu.o] Error 9
nvcc error : 'cicc' died due to signal 9 (Kill signal)
gmake[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/build.make:5824: CMakeFiles/onnxruntime_providers_cuda.dir/workspace/onnxruntime/contrib_
ops/cuda/bert/flash_attention/flash_fwd_hdim192_bf16_sm80.cu.o] Error 9
nvcc error : 'cicc' died due to signal 9 (Kill signal)
gmake[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/build.make:5869: CMakeFiles/onnxruntime_providers_cuda.dir/workspace/onnxruntime/contrib_
ops/cuda/bert/flash_attention/flash_fwd_hdim224_fp16_sm80.cu.o] Error 9
nvcc error : 'cicc' died due to signal 9 (Kill signal)
gmake[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/build.make:5884: CMakeFiles/onnxruntime_providers_cuda.dir/workspace/onnxruntime/contrib_
ops/cuda/bert/flash_attention/flash_fwd_hdim256_bf16_sm80.cu.o] Error 9
gmake[1]: *** [CMakeFiles/Makefile2:2029: CMakeFiles/onnxruntime_providers_cuda.dir/all] Error 2
gmake: *** [Makefile:166: all] Error 2
Traceback (most recent call last):
File "/workspace/tools/ci_build/build.py", line 2918, in
sys.exit(main())
File "/workspace/tools/ci_build/build.py", line 2810, in main
build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target)
File "/workspace/tools/ci_build/build.py", line 1702, in build_targets
run_subprocess(cmd_args, env=env)
File "/workspace/tools/ci_build/build.py", line 852, in run_subprocess
return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
File "/workspace/tools/python/util/run.py", line 49, in run
completed_process = subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '--build', '/workspace/build/Linux/Release', '--config', 'Release', '--', '-j16']
' returned non-zero exit status 2.

I need help.

By looking at some similar issue over the web, I tried CMAKE_CUDA_ARCHITECTURES=75, and building from source worked, but output image from the pipeline is not correct.

86 is correct for RTX 3060.

The error is due to out of memory. WSL2 normally only use half of CPU RAM unless you manually configure the limit.
As I mentioned in step 4: when your machine has less than 64GB memory, replace --parallel by --parallel 4 --nvcc_threads 1 to avoid out of memory.

Sign up or log in to comment