Performance Bottleneck a.Apply(gfu.vec, r)


we do Model Order Reduction for non-linear system, using NGSolve for the FEM backend.
After selecting specific elements from the full mesh, we perform the solution of the reduced mesh with a BilinearForm with the definedonelements parameter.

 aMini += SymbolicBFI(weights_gf*timestep*weak_K, definedonelements=act_el)

Profiling my code unveils a bottleneck that shows that the Apply method does not scale well with the number of elements or that there is some fix overhead which is not so small in terms of time.

The AssembleLinearization method scales very well with the number of calculated elements.

Could the developers have a short look at that? Perhaps there are some easy improvements to boost the Apply method?

Kind regards