Netgen GUI fails to start when libngsolve loaded and MPI=ON (ALT Linux)

Hi,

I’ve encountered an issue recently trying to run netgen GUI built with openMPI (github v6.2.1804):

[host-68.localdomain:12429] *** An error occurred in MPI_comm_size
[host-68.localdomain:12429] *** on communicator MPI_COMM_WORLD
[host-68.localdomain:12429] *** MPI_ERR_COMM: invalid communicator
[host-68.localdomain:12429] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

more output under spoiler
[spoiler][code]
[user@host-68 ~]$ /usr/lib64/openmpi-compat/bin/mpirun -np 4 netgen
NETGEN-6.2-dev
Developed by Joachim Schoeberl at
2010-xxxx Vienna University of Technology
2006-2010 RWTH Aachen University
1996-2006 Johannes Kepler University Linz
Including OpenCascade geometry kernel
Running MPI - parallel using 4 processors
MPI-version = 2.1
optfile ./ng.opt does not exist - using default values
togl-version : 2
OCC module loaded
loading ngsolve library
NGSolve-…-…-…
Using Lapack
[host-68.localdomain:12429] *** An error occurred in MPI_comm_size
[host-68.localdomain:12429] *** on communicator MPI_COMM_WORLD
[host-68.localdomain:12429] *** MPI_ERR_COMM: invalid communicator
[host-68.localdomain:12429] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

mpirun has exited due to process rank 0 with PID 12429 on
node host-68.localdomain exiting improperly. There are two reasons this could occur:

  1. this process did not call “init” before exiting, but others in
    the job did. This can cause a job to hang indefinitely while it waits
    for all processes to call “init”. By rule, if one process calls “init”,
    then ALL processes must call “init” prior to termination.

  2. this process called “init”, but exited without calling “finalize”.
    By rule, all processes that call “init” MUST call “finalize” prior to
    exiting or it will be considered an “abnormal termination”

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

[user@host-68 ~]$
[/code][/spoiler]

However if libngsolve is not loaded netgen starts fine (GUI works).

[spoiler] [user@host-68 ~]$ /usr/lib64/openmpi-compat/bin/mpirun -np 4 netgen NETGEN-6.2-dev Developed by Joachim Schoeberl at 2010-xxxx Vienna University of Technology 2006-2010 RWTH Aachen University 1996-2006 Johannes Kepler University Linz Including OpenCascade geometry kernel Running MPI - parallel using 4 processors MPI-version = 2.1 optfile ./ng.opt does not exist - using default values togl-version : 2 OCC module loaded loading ngsolve library cannot load ngsolve error: couldn't load file "libngsolve.so": libngsolve.so: cannot open shared object file: No such file or directory [user@host-68 ~]$ [/spoiler]

Are there any solution hints?

Hi,

This sounds like an issue of needing to add an entry for libgomp to the preload path. Maybe this will help…

find / -name libgomp.so.1 2>&1 | grep -v “Permission denied”

Then in the directory where the netgen binary is installed, look for the small textfile ngspy.

Edit that file, inserting the path to libgomp into the preload path so it looks something like this:

LD_PRELOAD=LD_PRELOAD:/act/openmpi-2.0/gcc-7.2.0/lib/libmpi.so:/opt/intel/mkl/lib/intel64/libmkl_core.so:/opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so:/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so:/opt/intel/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.so[b]:/usr/lib64/libgomp.so.1[/b] /home/ddrake/common/install/bin/python3 *

Best,

Dow