Using MPI without building from source

dfoiles · November 7, 2019, 4:13am

Hello,

I have been trying to run NGsolve in parallel on an HPC and I’d like to know if it’s possible to use the launchpad download of NGsolve with MPI? I have been trying to compile it from source both on the cluster and in containers, but have run into errors on both that I have been unable to fix.

Thanks

matthiash · November 7, 2019, 10:46am

Hello,

The setup on HPC clusters varies a lot, thus we do not offer prebuilt binaries for such environments.
Often, the default compiler on clusters is too old, which OS/compiler were you using?
For further hints I need your configuration (cmake) command and the complete command line output.

Best,
Matthias

dfoiles · November 7, 2019, 8:12pm

The best attempt I’ve had so far is building NGSolve in a singularity container with an Ubuntu environment. In it, I’ve installed the packages that are listed on the “Build on Linux” page as well as openmpi-bin, libopenmpi-dev, and numpy/scipy. My cmake command is:

cmake -DUSE_MPI=ON -DUSE_GUI=OFF -DCMAKE_INSTALL_PREFIX=${BASEDIR}/ngsolve-install ${BASEDIR}/ngsolve-src

The error message that I’ve received is quite long and I don’t know what parts are relevant, so I’ll attach a text file of the whole message. However, I think the important line is:

error: inlining failed in call to always_inline '__m256d _mm256_fmadd_pd(__m256d, __m256d, __m256d)': target specific option mismatch

I’ve searched for this error message myself and I’ve seen people suggest adding flags like “-msse4.1”, “-march=native”, “-march=nehalem”, and “-mavx” to CMAKE_CXX_FLAGS. I’ve tried this and have still gotten the same error.

Thank you for your willingness to help.

https://ngsolve.org/media/kunena/attachments/1252/Error.txt

Attachment: Error.txt

joachim · November 7, 2019, 8:41pm

edit ngsolve/ngstd.simd.hpp, line 1047

replace #ifdef AVX2 by
#ifdef FMA

and again in line 1065

Joachim

dfoiles · November 8, 2019, 6:55pm

That did the trick. Thank you very much for your help.

dfoiles · November 11, 2019, 9:46pm

Sorry to bother you again. Everything in the container is built and I’ve moved it to the HPC. I can successfully run the MPI tutorials provided in the source, but when I try to run my program, I get segmentation faults. Specifically, I get:

[node4][[23534,1],3][btl_tcp_frag.c:237:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) [node4][[23534,1],4][btl_tcp_frag.c:237:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) Caught SIGSEGV: segmentation fault Collecting backtrace... #1 /opt/ngsuite/ngsolve-install/lib/python3/dist-packages/netgen/../../../libngcore.so(+0x1864b) [0x7fa70f33664b] #2 /lib/x86_64-linux-gnu/libc.so.6(+0x43f60) [0x7fa7109bef60]

I have attached a copy of the code that I used.

Thank you for your help.

https://ngsolve.org/media/kunena/attachments/1252/Nanosphere.py

Attachment: Nanosphere_2019-11-11.py