MPI related questions

Hello,

I am now starting to play with MPI in ngsolve and got a couple of questions to ask:

(1) For my installation, I am asking ngsolve to download and build for me metis, mumps, and hypre, while it succeed in buliding all the packages, I can not get mumps working. In the final linking phase, my libngla.so has some undefined references…
…/linalg/libngla.so: undefined reference to blacs_gridinfo_' ../linalg/libngla.so: undefined reference to pzpotrf_’

So, I my working installation, I have to turn off mumps. This is always the most painful part due to my limited c++ experience… Do you see an immediate fix for this? I can provide more details about my build if you want…

(2) In https://ngsolve.org/docu/latest/how_to/howto_parallel.html, it says taskmanager allows for hybrid parallelization. I am assuming it is MPI+OPENMP, so how to do this hybrid parallization? I tried to run
mpirun -n 4 ngspy mpi_poisson.py
with SetNumThreads(8) in the code, but it didn’t work…

(3) Again in https://ngsolve.org/docu/latest/how_to/howto_parallel.html, it says mpi does not support periodic boundary yet :<
But in the git repository, there is a quite recent commit on mpi+periodic… are you recently working on solving this issue?

Best always,
Guosheng

  1. I think the problem here is the loading of the BLACS/SCALapack libraries.
    You can:
    [ul]
    [li]Use “ngspy” instead of python3[/li]
    [li]Set these libraries in the LD_PRELOAD_PATH (have a look at the “ngspy”-file in the NGSolve bin-directory)[/li]
    [li]In your python scripts, before importing ngsolve:
from ctypes import CDLL, RTLD_GLOBAL
for lib in THE_LIBRARIES_YOU_NEED:
    CDLL(lib, RTLD_GLOBAL)

[/li]
[/ul]

  1. MPI & C++11 threads. It should work like this:
ngsglobals.numthreads=X

Assembling/Applying of BLFs will be hybrid parallel, but most of the solvers/preconditioners will still be MPI-only.

  1. It should work. Keep in mind that your mesh must contain Surface- and BBND- Elements. (This is only an issue with manually generated meshes). Please contact me if you run into any problems with this.

I still can’t get mumps working…

Here is the details of my build, maybe you can help me find the bug:
(1) I have a local gcc-8.1, python3, and mpich installed
(2) I am using intel mkl library to locate Lapack/Blast

do-configure.txt contains my cmake details
c.txt is the output of running “./do-configure”,
m.txt is the output of running “make VERBOSE=1”, which produce the error message at final linking stage (line 4744–4756)

Thanks!

Attachment: c.txt

Attachment: m.txt

Attachment: do-configure.txt

Wait, this is exactly the same error I encountered two years ago (when I am installing in another machine)
https://ngsolve.org/forum/ngspy-forum/11-install-ngsolve-without-root-access?start=24
It was the MKL library issue, and I got the issue fixed by using a static library. -DMKL_STATIC=ON lol

But then I encountered an issue with MUMPS, in the demo code mpi_poisson.py, I refined twice the mesh with ngmesh.Refine() to make the problem bigger, then mumps solver failed to factorize the matrix, exit with a segmentation fault… hyper and masterinverse are working fine…
Is it a bug, or is there still something wrong with my installation?

Did it crash or did it terminate with an error message? I am looking into it.

For the mpi_poisson.py file (with two mesh refinement), I run

mpirun -n X ngspy mpi_possion.py

the code is working fine if I take X to be 1 or 2, converged in 1 iteration.
But generate the following seg fault if I take X to be 3

And generate the following message if I take X to be 4

Update Direct Solver PreconditionerMumps Parallel inverse, symmetric = 0
analysis ... factor ...            2 :INTERNAL Error: recvd root arrowhead 
           2 :not belonging to me. IARR,JARR=      -41598          13
           2 :IROW_GRID,JCOL_GRID=           0           0
           2 :MYROW, MYCOL=           0           2
           2 :IPOSROOT,JPOSROOT=       10982           0
application called MPI_Abort(MPI_COMM_WORLD, -99) - process 3

But mumps is working fine with smaller system when I only do one mesh refinement…

I am looking into it.

An update on the MUMPS issue:

I was able to reproduce the issue, but was unable to find a bug in NGSolve that caused it.

I just tried it with the newest MUMPS release (5.2.0) and it works without issues now.
I then looked in the release notes of MUMPS and found:
“Workaround a segfault at beg. of facto due to a gfortran-8 bug”
Might this also be the issue you are having?

You can update the MUMPS version that is built with NGSolve in the file
NGSOLVE-SRC/cmake/external_projects/mumps.cmake
Replace the URL/URL_MD5 lines with the link to the newest Version.
URL “http://mumps.enseeiht.fr/MUMPS_5.2.0.tar.gz
URL_MD5 cd6d06f27ce2689eb0436e41fcc9caed

Best, Lukas