High Order Element Assembly Routines


i implemented a high order assembly routine for the element matrix (atm: only in 2D for the laplacian), which i want to compare with ngsolve’s routines and obv. sum-factorization. But i’m missing some information.

I haven’t found any reference to sum-factorization in ngsolve yet. Is it implemented in ngsolve? Either as assembly routine or incorporated in a cg-version (as e.g. in the monograph of Karniadakis/Sherwin and related work).

Since i haven’t implemented a matrix/mesh-free version yet, i want to compare the time until the global matrix is fully assembled.
What is the most efficient way to do this in ngsolve? (Is it just H1(mesh,order), a = bilinearform, a.assemble() or is there a faster method?)

What is the asymptotic complexity for your element matrix calculation?

Best regards

Hi Tim,

we don’t use sum-factorization for the matrix assembling, but we can use sum-factorization for matrix-free operator application with NGSolve. See tutorial 2-11 on matrix-free operator application, and 3-3 for matrix-free time-stepping methods.

We have timers for the major functions in NGSovle, which you can access via

 for t in Timers():
    print (t)

Look out for “Matrix assembling”, “SymbolicBFI::CalcElementMatrix”,
“static condensation”, “SparseMatrixSymmetric::AddElementMatrix”, “CGSolver”
They are meant for experts, some count wall-clock, some count single core CPU-time …

You are welcome to present and discuss your findings.

For matrix-free methods the main difficulty is the preconditioner. We have some experimental stuff floating around, but it is not sufficiently robust for release. Contributions are very welcome !

The complexity of element-matrix assembly in NGSolve is the same as static condensation, O(p^3d).
As far as I remember some comparison with sum-factorization, the break even was between 4 and 6 (on tetrahedral meshes). Of course, it heavily depends on the linear algebra kernel you use, and this is pretty optimized in NGSolve.


Hi Joachim,

thanks for the Answer. I will take a look at the timer.

Im experimenting with really high polynomial degrees.
Thus a small follow up question. Can i provide more local heap to the BilinearForm assemble routine?
I get limited at p=40, even if i only use like 2 triangles.