MUMPS MPI runtime error

Hi, i try to use the mumps solver without mpiexec. from python this works but not from c++. i have compiled ngsolve by myself with mumps, mkl and mpi enabled.

From the tutorials in python i learned that the only thing i habe to do with mpi is:

comm = MPI.COMM_WORLD
rank = comm.rank
if comm.rank == 0:
    path      = sys.argv[1]
    mesh      = ngsolve.Mesh(path)
    ngmesh = mesh.ngmesh #netgen.csg.unit_cube.GenerateMesh(maxh=0.05)
    ngmesh.Distribute(comm)
else:
    ngmesh = netgen.meshing.Mesh.Receive(comm)
mesh = Mesh(ngmesh)

where the else branch is only relevant for mpiexec -n > 1, so i think this should also work from C++ ?

    MPI_Init(&argc, &argv);

    netgen::NgMPI_Comm comm(MPI_COMM_WORLD);
    std::cout << argv[1] << std::endl;

    netgen::Ngx_Mesh mesh(argv[1], comm);
    std::cout << mesh.GetCommunicator().Size() << std::endl;
    std::shared_ptr<MeshAccess> ma = make_shared<MeshAccess>(mesh.GetMesh());
    shared_ptr<netgen::Mesh> ngmesh = ma->GetNetgenMesh();

but calling the Arnoldisolver with inverse mumps gives me :

[x201t-arch:00000] *** An error occurred in MPI_Comm_rank
[x201t-arch:00000] *** reported by process [3076849664,0]
[x201t-arch:00000] *** on communicator MPI_COMM_WORLD
[x201t-arch:00000] *** MPI_ERR_COMM: invalid communicator
[x201t-arch:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[x201t-arch:00000] ***    and MPI will try to terminate your MPI job as well)

the python code runs without error and uses also the self compiled code from nsolve.

what can go wrong here ? THX for help :slight_smile:

Hi Kai !
I don’t see you redistributing the netgen mesh in your C++ code which instead you do in Python :slight_smile:

Hi Kai,

also in Python you can load and distribute the mesh with one call:

mesh = ngsolve.Mesh(filename, comm)

If I read you mail right, you are saying the c++ code works for mesh distribution, and assembling. Can you confirum that ?
Only when calling the mumps inverse mpi gets an invalide communicator ?

Joachim

you are creating an ParallelMumpsInverse ?

Hi Joachim :slight_smile: .Yes, exactly. If i use every other direct solver the code works with valid results.
I have created a single file example you can find on github here.

i just do

 int num = 15;
    ngla::Complex shift(0, 4000);
    ngcore::Array<ngcore::Complex> lams(num);
    ngcore::Array<std::shared_ptr<ngla::BaseVector>> evecs(num);
    ngla::Arnoldi<ngla::Complex> arnoldi(bfa->GetMatrixPtr(), bfm->GetMatrixPtr(), fes->GetFreeDofs());

    arnoldi.SetShift(shift);
    // allowed is: 'sparsecholesky', 'pardiso', 'pardisospd', 'mumps', 'masterinverse', 'umfpack'
    arnoldi.SetInverseType("mumps");
    arnoldi.Calc(2*num+1, lams, num, evecs);

if i comment out the “Calc” line the programm runs without error.