Could MPI Give Wrong Results with Certain Node Numbers?

VienC · March 20, 2025, 7:51pm

Hi everyone,

I am encountering an issue where the results from my MPI parallel computations differ depending on the number of nodes used.

We have a mesh with around 35,000 DOFs, and we used H1 spaces that are defined on subdomains . I observed the following behaviour:

Running without MPI, as well as using 10 and 21 nodes, gives the same results.
Running with 20, 25, and 35 nodes also gives consistent results among themselves, but these results are different from the non-MPI case.

I have done some checks to ensure correctness:

The integration of grid functions on volumes and surfaces to compare areas/volumes are the same between non-MPI and MPI, regardless of the number of nodes.
The material properties at every mesh point are identical with and without MPI.
Using both the MPI direct solver and iterative solver gave the same results.

I was wondering what could be causing this discrepancy. Could it be related to numerical precision or mesh partitioning effects? Has anyone encountered something similar before?

Any insights or suggestions on further debugging would be greatly appreciated!

Thanks!