Hi,
I’ve encountered an issue recently trying to run netgen GUI built with openMPI (github v6.2.1804):
[host-68.localdomain:12429] *** An error occurred in MPI_comm_size
[host-68.localdomain:12429] *** on communicator MPI_COMM_WORLD
[host-68.localdomain:12429] *** MPI_ERR_COMM: invalid communicator
[host-68.localdomain:12429] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
…
more output under spoiler
[spoiler][code]
[user@host-68 ~]$ /usr/lib64/openmpi-compat/bin/mpirun -np 4 netgen
NETGEN-6.2-dev
Developed by Joachim Schoeberl at
2010-xxxx Vienna University of Technology
2006-2010 RWTH Aachen University
1996-2006 Johannes Kepler University Linz
Including OpenCascade geometry kernel
Running MPI - parallel using 4 processors
MPI-version = 2.1
optfile ./ng.opt does not exist - using default values
togl-version : 2
OCC module loaded
loading ngsolve library
NGSolve-…-…-…
Using Lapack
[host-68.localdomain:12429] *** An error occurred in MPI_comm_size
[host-68.localdomain:12429] *** on communicator MPI_COMM_WORLD
[host-68.localdomain:12429] *** MPI_ERR_COMM: invalid communicator
[host-68.localdomain:12429] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
mpirun has exited due to process rank 0 with PID 12429 on
node host-68.localdomain exiting improperly. There are two reasons this could occur:
-
this process did not call “init” before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call “init”. By rule, if one process calls “init”,
then ALL processes must call “init” prior to termination. -
this process called “init”, but exited without calling “finalize”.
By rule, all processes that call “init” MUST call “finalize” prior to
exiting or it will be considered an “abnormal termination”
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
[user@host-68 ~]$
[/code][/spoiler]
However if libngsolve is not loaded netgen starts fine (GUI works).
[spoiler]
[user@host-68 ~]$ /usr/lib64/openmpi-compat/bin/mpirun -np 4 netgen
NETGEN-6.2-dev
Developed by Joachim Schoeberl at
2010-xxxx Vienna University of Technology
2006-2010 RWTH Aachen University
1996-2006 Johannes Kepler University Linz
Including OpenCascade geometry kernel
Running MPI - parallel using 4 processors
MPI-version = 2.1
optfile ./ng.opt does not exist - using default values
togl-version : 2
OCC module loaded
loading ngsolve library
cannot load ngsolve
error: couldn't load file "libngsolve.so": libngsolve.so: cannot open shared object file: No such file or directory
[user@host-68 ~]$
[/spoiler]
Are there any solution hints?