NGSolve with CUDA on Windows - problem with compilation

I am trying to compile NGSOLVE with CUDA support on Windows. (I am able to compile it without CUDA.)

My working dir is C:/ngsolve-netgen

I am installing to C:/install-CUDA

my build script is:

cd build-CUDA
set OMPLIB=“C:/Program Files (x86)/Intel/oneAPI/compiler/2025.3/lib/libiomp5md.lib”
set OMPDIR=“C:/Program Files (x86)/Intel/oneAPI/compiler/2025.3/bin”

set CUDA_PATH C:/Programs/NVIDIA_CUDA/13_3
set CUDACXX %CUDA_PATH%/bin/nvcc.exe
set CudaToolkitBinDir=%CUDA_PATH%/bin
set CudaToolkitTargetBinDir=%CUDA_PATH%/lib

“C:\Program Files\CMake\bin\cmake.exe” “../src” -G “Visual Studio 17 2022” -DBUILD_SHARED_LIBS=ON -DBUILD_OCC=ON -DUSE_OCC=ON -DOCC_HAVE_HISTORY=ON -DUSE_CUDA=ON -DCUDAToolkit_ROOT=%CUDA_PATH%/ -DCUDAToolkitDir=%CUDA_PATH%/ -DCMAKE_INSTALL_PREFIX=“C:/install-CUDA”  -DUSE_MKL=ON -DOMP_DLL_DIR=%OMPDIR% -DNETGEN_USE_MPI=OFF -DOMP_LIBRARY=%OMPLIB% -DPython3_EXECUTABLE=“C:/python313/python.exe” -DPython3_INCLUDE_DIRS=“C:/python313/include” -DPython3_LIBRARIES=“c:/Python313/libs/python313.lib” -DPython3_ROOT_DIR=“C:/python313”  -DCMAKE_CXX_FLAGS_RELWITHDEBINFO=“/Zi /O2 /Ob1 /DNDEBUG”

“C:\Program Files\CMake\bin\cmake.exe” --build . --config RelWithDebInfo --target install
cd ..

pause

I am obtaining error

Compiling CUDA source file …\src\ngscuda\cuda_profiler.cu…

C:\ngsolve-netgen\build-CUDA\ngsolve\ngscuda>“C:\Programs\NVIDIA_CUDA\13_3/bin\nvcc.exe” --use-local-env -ccbin “C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\bin\HostX64\x64” -x cu -rdc=true -I"C:\ngsolve-netgen\src\ngscuda" -I"C:\ngsolve-netgen\src\solve" -I"C:\ngsolve-netgen\src\parallel" -I"C:\ngsolve-netgen\src\multigrid" -I"C:\ngsolve-netgen\src\bem" -I"C:\ngsolve-netgen\src\comp" -I"C:\ngsolve-netgen\src\linalg" -I"C:\ngsolve-netgen\src\fem" -I"C:\ngsolve-netgen\src\basiclinalg" -I"C:\ngsolve-netgen\src\ngstd" -I"C:\ngsolve-netgen\src\include" -I"C:\ngsolve-netgen\build-CUDA\ngsolve" -IC:\python313\Include -I"C:\install-CUDA\include\include" -I"C:\install-CUDA\include" -IC:\Programs\NVIDIA_CUDA\13_3\include -IC:\Programs\NVIDIA_CUDA\13_3\include\cccl -I"C:\Program Files (x86)\Intel\oneAPI\mkl\2024.0\include" -IC:\Programs\NVIDIA_CUDA\13_3\include --keep-dir ngscudalib\x64\RelWithDebInfo -maxrregcount=0 --machine 64 --compile -cudart static -std=c++17 --expt-relaxed-constexpr --extended-lambda --diag-suppress=611 --diag-suppress=20011 --diag-suppress=20012 --diag-suppress=20013 --diag-suppress=20014 --diag-suppress=20015 -rdc=true /bigobj /arch:AVX512 /std:c++17 /wd4068 -Xcompiler=“/EHsc -Zi -Ob1” -D_WINDOWS -DNDEBUG -DMAX_SYS_DIM=3 -DNGS_EXPORTS -DCUDA -DNETGEN_PYTHON -DNG_PYTHON -DPYBIND11_SIMPLE_GIL_MANAGEMENT -D_WIN32_WINNT=0x1000 -DWNT -DWNT_WINDOW -DNOMINMAX -DMSVC_EXPRESS -D_CRT_SECURE_NO_WARNINGS -DHAVE_STRUCT_TIMESPEC -DWIN32 -DHAVE_NETGEN_SOURCES -DUSE_TIMEOFDAY -DTCL -DLAPACK -DUSE_PARDISO -DNGS_PYTHON -DUSE_UMFPACK -D"CMAKE_INTDIR="RelWithDebInfo"" -Dngscudalib_EXPORTS -D_WINDLL -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DMAX_SYS_DIM=3 -DNGS_EXPORTS -DCUDA -DNETGEN_PYTHON -DNG_PYTHON -DPYBIND11_SIMPLE_GIL_MANAGEMENT -D_WIN32_WINNT=0x1000-DWNT -DWNT_WINDOW -DNOMINMAX -DMSVC_EXPRESS -D_CRT_SECURE_NO_WARNINGS -DHAVE_STRUCT_TIMESPEC -DHAVE_NETGEN_SOURCES -DUSE_TIMEOFDAY -DTCL -DLAPACK -DUSE_PARDISO -DNGS_PYTHON -DUSE_UMFPACK -D"CMAKE_INTDIR="RelWithDebInfo"" -Dngscudalib_EXPORTS -Xcompiler “/EHsc /W1 /nologo /O2 /FS /Zi /MD /GR” -Xcompiler “/Fdngscudalib.dir\RelWithDebInfo\vc143.pdb” -o ngscudalib.dir\RelWithDebInfo\cuda_profiler.obj “C:\ngsolve-netgen\src\ngscuda\cuda_profiler.cu”

nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified

Can someone share correct way to compile with CUDA on Windows?