NGSolve with CUDA on Windows - problem with compilation

I am trying to compile NGSOLVE with CUDA support on Windows. (I am able to compile it without CUDA.)

My working dir is C:/ngsolve-netgen

I am installing to C:/install-CUDA

my build script is:

cd build-CUDA
set OMPLIB=“C:/Program Files (x86)/Intel/oneAPI/compiler/2025.3/lib/libiomp5md.lib”
set OMPDIR=“C:/Program Files (x86)/Intel/oneAPI/compiler/2025.3/bin”

set CUDA_PATH C:/Programs/NVIDIA_CUDA/13_3
set CUDACXX %CUDA_PATH%/bin/nvcc.exe
set CudaToolkitBinDir=%CUDA_PATH%/bin
set CudaToolkitTargetBinDir=%CUDA_PATH%/lib

“C:\Program Files\CMake\bin\cmake.exe” “../src” -G “Visual Studio 17 2022” -DBUILD_SHARED_LIBS=ON -DBUILD_OCC=ON -DUSE_OCC=ON -DOCC_HAVE_HISTORY=ON -DUSE_CUDA=ON -DCUDAToolkit_ROOT=%CUDA_PATH%/ -DCUDAToolkitDir=%CUDA_PATH%/ -DCMAKE_INSTALL_PREFIX=“C:/install-CUDA”  -DUSE_MKL=ON -DOMP_DLL_DIR=%OMPDIR% -DNETGEN_USE_MPI=OFF -DOMP_LIBRARY=%OMPLIB% -DPython3_EXECUTABLE=“C:/python313/python.exe” -DPython3_INCLUDE_DIRS=“C:/python313/include” -DPython3_LIBRARIES=“c:/Python313/libs/python313.lib” -DPython3_ROOT_DIR=“C:/python313”  -DCMAKE_CXX_FLAGS_RELWITHDEBINFO=“/Zi /O2 /Ob1 /DNDEBUG”

“C:\Program Files\CMake\bin\cmake.exe” --build . --config RelWithDebInfo --target install
cd ..

pause

I am obtaining error

Compiling CUDA source file …\src\ngscuda\cuda_profiler.cu…

C:\ngsolve-netgen\build-CUDA\ngsolve\ngscuda>“C:\Programs\NVIDIA_CUDA\13_3/bin\nvcc.exe” --use-local-env -ccbin “C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\bin\HostX64\x64” -x cu -rdc=true -I"C:\ngsolve-netgen\src\ngscuda" -I"C:\ngsolve-netgen\src\solve" -I"C:\ngsolve-netgen\src\parallel" -I"C:\ngsolve-netgen\src\multigrid" -I"C:\ngsolve-netgen\src\bem" -I"C:\ngsolve-netgen\src\comp" -I"C:\ngsolve-netgen\src\linalg" -I"C:\ngsolve-netgen\src\fem" -I"C:\ngsolve-netgen\src\basiclinalg" -I"C:\ngsolve-netgen\src\ngstd" -I"C:\ngsolve-netgen\src\include" -I"C:\ngsolve-netgen\build-CUDA\ngsolve" -IC:\python313\Include -I"C:\install-CUDA\include\include" -I"C:\install-CUDA\include" -IC:\Programs\NVIDIA_CUDA\13_3\include -IC:\Programs\NVIDIA_CUDA\13_3\include\cccl -I"C:\Program Files (x86)\Intel\oneAPI\mkl\2024.0\include" -IC:\Programs\NVIDIA_CUDA\13_3\include --keep-dir ngscudalib\x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static -std=c++17 --expt-relaxed-constexpr --extended-lambda --diag-suppress=611 --diag-suppress=20011 --diag-suppress=20012 --diag-suppress=20013 --diag-suppress=20014 --diag-suppress=20015 -rdc=true /bigobj /arch:AVX512 /std:c++17 /wd4068 -Xcompiler=“/EHsc -Zi -Ob1” -D_WINDOWS -DNDEBUG -DMAX_SYS_DIM=3 -DNGS_EXPORTS -DCUDA -DNETGEN_PYTHON -DNG_PYTHON -DPYBIND11_SIMPLE_GIL_MANAGEMENT -D_WIN32_WINNT=0x1000 -DWNT -DWNT_WINDOW -DNOMINMAX -DMSVC_EXPRESS -D_CRT_SECURE_NO_WARNINGS -DHAVE_STRUCT_TIMESPEC -DWIN32 -DHAVE_NETGEN_SOURCES -DUSE_TIMEOFDAY -DTCL -DLAPACK -DUSE_PARDISO -DNGS_PYTHON -DUSE_UMFPACK -D"CMAKE_INTDIR="RelWithDebInfo"" -Dngscudalib_EXPORTS -D_WINDLL -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DMAX_SYS_DIM=3 -DNGS_EXPORTS -DCUDA -DNETGEN_PYTHON -DNG_PYTHON -DPYBIND11_SIMPLE_GIL_MANAGEMENT -D_WIN32_WINNT=0x1000-DWNT -DWNT_WINDOW -DNOMINMAX -DMSVC_EXPRESS -D_CRT_SECURE_NO_WARNINGS -DHAVE_STRUCT_TIMESPEC -DHAVE_NETGEN_SOURCES -DUSE_TIMEOFDAY -DTCL -DLAPACK -DUSE_PARDISO -DNGS_PYTHON -DUSE_UMFPACK -D"CMAKE_INTDIR="RelWithDebInfo"" -Dngscudalib_EXPORTS -Xcompiler “/EHsc /W1 /nologo /O2 /FS /Zi /MD /GR” -Xcompiler “/Fdngscudalib.dir\RelWithDebInfo\vc143.pdb” -o ngscudalib.dir\RelWithDebInfo\cuda_profiler.obj “C:\ngsolve-netgen\src\ngscuda\cuda_profiler.cu”

nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified

Can someone share correct way to compile with CUDA on Windows?

I probably found the cause of a problem in cmake file and created issue https://github.com/NGSolve/ngsolve/issues/90

Hi QB256,

super, thank you for diving into that. Can you confirm that ngscuda then also works on Windows?

Our CMake expert will then incorporate your fix,

best, Joachim

Problem is actually more complicated. It will need to touch code. nvcc on windows has problem with all header files there will be probably need to add #ifndef CUDACC to prevent nvcc to try to compile code not intended to be compiled in CUDA environment.

Now I have no time to investigate such deep problem.