custer installation issue (again)

Hi guys,

I am going through the same installation issue again…

A year ago, with your help I have installed ngsolve in a cluster… but I have no access to the cluster anymore…

Now, I would like to get ngsolve installed on my local cluster, but it cause issue again as usual.

So, after make install, I got the following error message at the final linking stage (100%):

/users/gfu1/data/ngbuild/ngsolve-install/lib/libmesh.so: undefined reference to gzclose' /users/gfu1/data/ngbuild/ngsolve-install/lib/libmesh.so: undefined reference to gzwrite’
/users/gfu1/data/ngbuild/ngsolve-install/lib/libmesh.so: undefined reference to gzopen' /users/gfu1/data/ngbuild/ngsolve-install/lib/libmesh.so: undefined reference to gzread’

A similar linking issue was found a year ago, but the fix there was simple…here I don’t have a clue what’s going on

Best,
Guosheng

Hi,

sounds like you are missing “zlib”.
On my ubuntu 18.04 the package “zlib1g-dev” contains what you need. But at least for ubuntu, I thinks that’s installed by default. Which OS are you using?

Best,
Christoph

interesting…
I do have a zlib(what is there!?) issue initially, so, I passed the flags in cmake to the correct location
DZLIB_xxx=…
to locate zlib

I thought this shall be enough

We have a RedHat7 system…

Hi,

you could check your netgen cmake cache. This file is located in the netgen subfolder of you ngsolve build folder.
My CMakeCache.txt contains these lines.

//Path to a file.
ZLIB_INCLUDE_DIR:PATH=/usr/include

//Path to a library.
ZLIB_LIBRARY_DEBUG:FILEPATH=ZLIB_LIBRARY_DEBUG-NOTFOUND

//Path to a library.
ZLIB_LIBRARY_RELEASE:FILEPATH=/usr/lib/x86_64-linux-gnu/libz.so

Which variable did you set? The location of the following variable should be used as hint to search for the header and the library.

-DZLIB_INCLUDE_DIRS=...

Best,
Christoph

I have

ZLIB_INCLUDE_DIR:PATH=/gpfs/runtime/opt/zlib/1.2.8
//No help, variable specified on the command line.
ZLIB_LIBRARY:UNINITIALIZED=/gpfs/runtime/opt/zlib/1.2.8/lib

But I don’t have
ZLIB_LIBRARY_DEBUG
or
ZLIB_LIBRARY_RELEASE

In the cmake file, I used these two lines to locate zlib

-DZLIB_INCLUDE_DIR=/gpfs/runtime/opt/zlib/1.2.8/
-DZLIB_LIBRARY=/gpfs/runtime/opt/zlib/1.2.8/lib \

If you specify it that way, you have to set

-DZLIB_LIBRARY=/gpfs/runtime/opt/zlib/1.2.8/lib/libz.so

And check if the “libz.so” is in this folder.

Did you try to set

-DZLIB_INCLUDE_DIRS=/gpfs/runtime/opt/zlib/1.2.8/

I think this should find both, library and header.

Best,
Christoph

Well,
DZLIB_INCLUDE_DIRS
is not recognized…

Now, I have a even worse issue while building netgen…

[ 26%] Built target visual
/tmp/ccV60kWH.s: Assembler messages:
/tmp/ccV60kWH.s:1100: Error: suffix or operands invalid for vbroadcastsd' make[8]: *** [libsrc/linalg/CMakeFiles/la.dir/bfgs.cpp.o] Error 1 [ 26%] Building CXX object libsrc/general/CMakeFiles/gen.dir/seti.cpp.o [ 27%] Building CXX object libsrc/general/CMakeFiles/gen.dir/sort.cpp.o /tmp/ccsdcFfI.s: Assembler messages: /tmp/ccsdcFfI.s:896: Error: suffix or operands invalid for vbroadcastsd’
/tmp/ccsdcFfI.s:1039: Error: suffix or operands invalid for vbroadcastsd' /tmp/ccsdcFfI.s:1504: Error: no such instruction: vpbroadcastd %xmm0,%ymm0’
/tmp/ccsdcFfI.s:1505: Error: suffix or operands invalid for vpaddd' /tmp/ccsdcFfI.s:1510: Error: suffix or operands invalid for vpaddd’
/tmp/ccsdcFfI.s:2813: Error: suffix or operands invalid for vbroadcastsd' /tmp/ccsdcFfI.s:3999: Error: suffix or operands invalid for vbroadcastsd’
/tmp/ccsdcFfI.s:4462: Error: suffix or operands invalid for `vbroadcastsd’
make[8]: *** [libsrc/linalg/CMakeFiles/la.dir/densemat.cpp.o] Error 1

This didn’t happen yesterday :<

After deleting all the old files… now, the installation phase is working fine. It was the ZLIB issue…

However, it give me a segmentation fault when I tried

from ngsolve import *

FYI, besides the seg. fault,
this is what I got when run
make test_ngsolve

3% tests passed, 28 tests failed out of 29

Label Time Summary:
accuracy = 0.26 sec (3 tests)
performance = 0.09 sec (1 test)
standard = 0.95 sec (11 tests)

Total Test time (real) = 3.79 sec

The following tests FAILED:
1 - assemble.py (Failed)
2 - bla.py (Failed)
3 - poisson.py (Failed)
4 - adaptive.py (Failed)
5 - cmagnet.py (Failed)
6 - mixed.py (Failed)
7 - hybrid_mixed.py (Failed)
8 - hybrid_dg.py (Failed)
9 - taskmanager.py (Failed)
10 - compound.py (Failed)
11 - pickling.py (Failed)
12 - d1_square.pde (Failed)
13 - d2_chip.pde (Failed)
14 - d3_helmholtz.pde (Failed)
15 - d4_cube.pde (Failed)
16 - d5_beam.pde (Failed)
17 - d6_shaft.pde (Failed)
18 - d7_coil.pde (Failed)
19 - d8_coilshield.pde (Failed)
20 - d9_hybridDG.pde (Failed)
21 - d10_DGdoubleglazing.pde (Failed)
22 - d11_chip_nitsche.pde (Failed)
23 - d4_cube_performance.pde (Failed)
24 - acc_poisson_circle.pde (Failed)
25 - acc_poisson_circle_HDG.pde (Failed)
26 - acc_poisson_circle_HDG_hodc.pde (Failed)
28 - test_ngscxx (Not Run)
29 - pytest (Failed)
Errors while running CTest
gmake[4]: *** [test] Error 8
make[3]: *** [CMakeFiles/test_ngsolve] Error 2
make[2]: *** [CMakeFiles/test_ngsolve.dir/all] Error 2
make[1]: *** [CMakeFiles/test_ngsolve.dir/rule] Error 2
make: *** [test_ngsolve] Error 2

Hi Guosheng,

not everything from the sequential version is expected to work in parallel, see this list of supported MPI functionality:

https://ngsolve.org/docu/latest/how_to/howto_parallel.html

Joachim

Hi Guosheng,

Sorry to hear you are running into issues again :frowning: . Let’s see if we can resolve this.

Could you send me a backtrace of gdb for a simple .pde-file?

For example, execute “d1_square.pde” from the pde_tutorials with

mpirun -np 5  bash .wrap_mpirun gdb -batch -ex "run" -ex bt --args ngs d1_square.pde

“.wrap_mpirun” should be something like this (with OpenMPI), and just pipes the output to seperate files
for all mpi-ranks:

#!/bin/sh
ARGS=$@
$ARGS 1>out_p$OMPI_COMM_WORLD_RANK 2>out_p$OMPI_COMM_WORLD_RANK

Also, please send me the output of:

which ngspy | xargs cat
which ngs | xargs ldd

Also, could you try to import netgen in python and see if that crashes too?

Finally, your CmakeCache, the cmake-command you used and the cmake-output would be useful.

In principle, the tests are supposed to work with MPI! In this case, of course, they all fail because something is wrong with the ngsolve-libraries.

While this is probably not the issue you are running into in this case, on the cluster, they might still fail because you might not be allowed to run any MPI computations on the login node . If that is the issue, you have to go through the batch system (e.g. by switching to an interactive session and then running the tests as usual.)

Best,
Lukas

Lukas,

Here is my cmake file:

 cmake \
-DUSE_UMFPACK=OFF \
-DCMAKE_PREFIX_PATH=/users/gfu1/data/ngsolve-install-plain \
-DCMAKE_BUILD_TYPE=Release \
-DINSTALL_DIR=/users/gfu1/data/ngsolve-install-plain \
-DUSE_GUI=OFF \
-DUSE_MPI=OFF \
-DUSE_MUMPS=OFF \
-DUSE_HYPRE=OFF \
-DUSE_MKL=ON \
-DMKL_ROOT=/gpfs/runtime/opt/intel/2017.0/mkl \
-DZLIB_INCLUDE_DIR=/gpfs/runtime/opt/zlib/1.2.8/ \
-DZLIB_LIBRARY=/gpfs/runtime/opt/zlib/1.2.8/lib/libz.so \
-DMKL_SDL=OFF \
-DCMAKE_CXX_COMPILER=/gpfs/runtime/opt/gcc/5.2.0/bin/g++ \
-DCMAKE_C_COMPILER=/gpfs/runtime/opt/gcc/5.2.0/bin/gcc \
 ../ngsolve-src                                                                   

I turned off MPI.
Here in the attachment is the CmakeCache in the build directory.

I am doing everything in a computing node via a interactive session.

which ngs | xargs ldd gives the following:

	linux-vdso.so.1 =>  (0x00007fff643ff000)
	/usr/local/lib/libslurm.so (0x00007f7bf293f000)
	libsolve.so => /users/gfu1/data/ngsolve-install-plain/lib/libsolve.so (0x00007f7bf2644000)
	libngcomp.so => /users/gfu1/data/ngsolve-install-plain/lib/libngcomp.so (0x00007f7bf1adc000)
	libngfem.so => /users/gfu1/data/ngsolve-install-plain/lib/libngfem.so (0x00007f7bf0523000)
	libngla.so => /users/gfu1/data/ngsolve-install-plain/lib/libngla.so (0x00007f7befdc6000)
	libngbla.so => /users/gfu1/data/ngsolve-install-plain/lib/libngbla.so (0x00007f7befadb000)
	libngstd.so => /users/gfu1/data/ngsolve-install-plain/lib/libngstd.so (0x00007f7bef76c000)
	libnglib.so => /users/gfu1/data/ngsolve-install-plain/lib/libnglib.so (0x00007f7bef562000)
	libinterface.so => /users/gfu1/data/ngsolve-install-plain/lib/libinterface.so (0x00007f7bef308000)
	libstl.so => /users/gfu1/data/ngsolve-install-plain/lib/libstl.so (0x00007f7bef07b000)
	libgeom2d.so => /users/gfu1/data/ngsolve-install-plain/lib/libgeom2d.so (0x00007f7beee39000)
	libcsg.so => /users/gfu1/data/ngsolve-install-plain/lib/libcsg.so (0x00007f7beeb33000)
	libmesh.so => /users/gfu1/data/ngsolve-install-plain/lib/libmesh.so (0x00007f7bee623000)
	libz.so.1 => /gpfs/runtime/opt/zlib/1.2.8/lib/libz.so.1 (0x00007f7bee40d000)
	libvisual.so => /users/gfu1/data/ngsolve-install-plain/lib/libvisual.so (0x00007f7bee20c000)
	libpython3.6m.so.1.0 => /gpfs/runtime/opt/python/3.6.1/lib/libpython3.6m.so.1.0 (0x00007f7bedd04000)
	/gpfs/runtime/opt/intel/2017.0/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f7bed1e6000)
	/gpfs/runtime/opt/intel/2017.0/mkl/lib/intel64/libmkl_gnu_thread.so (0x00007f7bec01a000)
	/gpfs/runtime/opt/intel/2017.0/mkl/lib/intel64/libmkl_core.so (0x00007f7bea52a000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f7bea31e000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7bea101000)
	libstdc++.so.6 => /gpfs/runtime/opt/gcc/5.2.0/lib64/libstdc++.so.6 (0x00007f7be9d73000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f7be9aef000)
	libgomp.so.1 => /gpfs/runtime/opt/gcc/5.2.0/lib64/libgomp.so.1 (0x00007f7be98ce000)
	libgcc_s.so.1 => /gpfs/runtime/opt/gcc/5.2.0/lib64/libgcc_s.so.1 (0x00007f7be96b7000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f7be9323000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f7bf2cf6000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007f7be911f000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f7be8f16000)

I do not have ngspy in my $NETGENDIR directory, where only
ngs, ngscxx, ngsld
are available.
In my laptop version of ngsolve, I also do not have ngspy…

Best,
Guosheng

Attachment: CMakeCache.txt

Now I have a complete rebuild of ngsolve.
The segmentation fault is gone, surprise!
But I have a MKL issue (this bug is way more friendly:>):

Intel MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so.

In my application, I need a hybrid-Poisson solver, so I am using the static condensation approach for the implementation, and apply a sparsecholesky factorization for the resulting hybrid matrix.
It is this line that cause the code crash:

inva = av.mat.Inverse(fes.FreeDofs(coupling=True), inverse="sparsecholesky")

I recall that I have another build that turned off MKL, which cause the seg. fault before, but now I am not completely sure…

Best,
Guosheng

If you have not enabled MPI, that makes things less complicated. You don’t need ngspy in that case
(that only exists because we had some issues with linking MKL libraries and MPI on certain systems).

You can simply run

gdb -ex run --args python3 poisson.py

or something similar.

Try turning MKL_SDL on,
-DMKL_SDL=ON

Ha, with
-DMKL_SDL=ON
the installation works!

This shall save a lot my computing time, thank you guys! (hopefully everything works…)