Bad TaskManager() performance for eigenvalue problems.

Dear all,

I use ngsolve to solve for eigenvalues/functions of 2D Helmholtz equations in an electrodynamics context. For my problem, I need to find a large number of such eigenvalues and eigenvectors (on the order of thousands). As a starting point I took the PML-Tutorial (1.7.1 Perfectly Matched Layer (PML) — NGS-Py 6.2.2302 documentation) and adapted it according to my needs. In principle, this works fine and the results nicely coincide with analytical solutions, where available.

The problem: Calculating many eigenvalues/vectors is, of course, computationally expensive as the grid has to be chosen very fine. Therefore, I tried to run the modified PML Code on the a HPC cluster (on VSC4). I realized, however, that with larger number of available cpu cores, the code actually got slower. Upon closer analysis, I found that this issue relates to the usage of the ‘with TaskManager():’ parallel computing environment and can already be reproduced on my Laptop (which has a AMD 16 logical core CPU). Instead of accelerating the computation, the TaskManager() seems to result in longer runtimes, interestingly along with higher CPU usage. On my device, I can already see the effect using the PML-Tutorial (unfortunately, the forum’s attachement function seems not to work for me atm, so I just add the code I modified in this very tutorial).

With the TaskManager() this takes 1.7 s:
u = GridFunction(fes, multidim=50, name=‘resonances’)
with TaskManager():
lam = ArnoldiSolver(a.mat, m.mat, fes.FreeDofs(),
u.vecs, shift=400)

Without the TaskManager() this takes 0.9 s:
u = GridFunction(fes, multidim=50, name=‘resonances’)
lam = ArnoldiSolver(a.mat, m.mat, fes.FreeDofs(),
u.vecs, shift=400)

Is there a workaround to get a performance boost from the usage of several cpu cores? Also I’m happy about general suggestions to increase the performance when computing several thousands of eigenvectors/values with ngsolve.

I’d appreciate any suggestions.

All the best,

Hi Oliver,

the ArnoldiSolver requires a sparse direct solver, you can choose which one:

ArnoldiSolver ( … inverse=“sparsecholesky” )

The sparsecholesky is the in-house solver, which works for symmetric (not Hermitean!) matrices, as you get from PML. It works together with TaskManager.
Alternatives are umfpack or pardiso, which do their own threading, and compete against TaskManager.

To get insight into the task-scheduling you can use the vite trace analyzer, see: