Adding >65000 CoefficientFunctions together results in segfault

Hello there! I encountered a somewhat bizarre error in my large-scale computations: The following code will reliably hit “Segmentation fault (core dumped)” upon the mip evaluation when run in Python 3.10.12, Ngsolve 6.2.2404.post51:

from ngsolve import *

mesh = Mesh(unit_square.GenerateMesh(maxh=0.2))
mip = mesh(0.2,0.4)

iters = 66000
cf = CoefficientFunction(0)

for i in range(int(iters)):
    cf += 1
    
print(cf(mip))

For slightly lower values of iters, it will either run very quickly, e.g. 65000 is apparently safe, while 65500 will randomly either run quickly or segfault with no apparent consistency.

While adding many CoefficientFunctions together like this might appear pointless, I would like to avoid interpolation to GridFunctions as much as possible. I have a functioning workaround by running the CoefficientFunction summation in batches and regularly interpolating the result to a fes, but this is both slower and less accurate than if I had the exact sum of the CoefficientFunctions available. As I need a large-scale computation with highly precise adjoint computations for my optimal experimental design problem, this is a slight issue (as the CoefficientFunctions represent sensors in my experimental design, they are naturally a bit more complicated than just = 1).

Is there any option I can tweak that would fix this? I am guessing it somehow relates to the length of the expression tree (although regular use of .Compile() only slows the code down dramatically and does not avoid segfaults).

Thanks for your time!
Christian

For me the code runs cleanly, also valgrind doesn’t find anything.
which OS are you running?

best
Christopher

Interesting! This is on Ubuntu 22.04.4 LTS. Maybe it will run into an error for larger values of iters on other OSes?

The standard evaluation of expression trees is by recursive function calls.
Thus there are limits on the expression tree by the stack-size.

When running in parallel (TaskManager), the stack-size on MacOS is limited to 8MB, for the parallel threads. On Linux you can increase that limit, but I never found a solution for Mac.

If you can build a better balanced tree, you don’t need deep recursive function calls.

best, Joachim

I don’t know how I would be able to improve my tree balance, since the practical use case is adding 25k-100k more or less linearly independent CoefficientFunctions, e.g. exp(-c|(x,y)-(x_k,y_k)|^2) for some reasonable c and a collection of different base points (x_k,y_k). Any suggestions would be great!

How does one increase the tree size on Linux? I can’t find it from TaskManager’s documentation (and by default, TaskManager does not prevent the above code from segfaulting). On my hardware, speed/memory doesn’t seem like a limiting factor per se.

Best,
Christian

The plus-operator is a node in the tree. When you iteratively add 1, the constant 1 will always be the right sub-tree, the previous goes to the left. So the tree has depth n. We can visualize the tree in ngsolve by printing the coefficient funciton. The depth of indention shows the tree level. Here are 3 examples (with 16 = 4*4 = 2**4 terms). Last one is the best.

If you have a list of N terms, make a recursive function which adds the sum of the first half of terms, with the sum of the second half of terms.

from ngsolve import *

mesh = Mesh(unit_square.GenerateMesh(maxh=0.2))
mip = mesh(0.2,0.4)

# version 1: summing n terms
cf = CoefficientFunction(0)
for i in range(16):
    cf += 1
print(cf)
print(cf(mip))

# version 2: summing n*n terms:
cf = CF(0)
for i in range(4):
    partsum = CF(0)
    for j in range(4):
        partsum += 1
    cf += partsum
print (cf)
print (cf(mip))

# version 3: perfectly balanced tree:
cf = CF(1)
for i in range(4):
    cf = cf+cf
print (cf)
print (cf(mip))    

the outcome is:

coef binary operation '+', real
  coef binary operation '+', real
    coef binary operation '+', real
      coef binary operation '+', real
        coef binary operation '+', real
          coef binary operation '+', real
            coef binary operation '+', real
              coef binary operation '+', real
                coef binary operation '+', real
                  coef binary operation '+', real
                    coef binary operation '+', real
                      coef binary operation '+', real
                        coef binary operation '+', real
                          coef binary operation '+', real
                            coef binary operation '+', real
                              coef 1, real
                              coef 1, real
                            coef 1, real
                          coef 1, real
                        coef 1, real
                      coef 1, real
                    coef 1, real
                  coef 1, real
                coef 1, real
              coef 1, real
            coef 1, real
          coef 1, real
        coef 1, real
      coef 1, real
    coef 1, real
  coef 1, real

16.0
coef binary operation '+', real
  coef binary operation '+', real
    coef binary operation '+', real
      coef binary operation '+', real
        coef binary operation '+', real
          coef binary operation '+', real
            coef 1, real
            coef 1, real
          coef 1, real
        coef 1, real
      coef binary operation '+', real
        coef binary operation '+', real
          coef binary operation '+', real
            coef 1, real
            coef 1, real
          coef 1, real
        coef 1, real
    coef binary operation '+', real
      coef binary operation '+', real
        coef binary operation '+', real
          coef 1, real
          coef 1, real
        coef 1, real
      coef 1, real
  coef binary operation '+', real
    coef binary operation '+', real
      coef binary operation '+', real
        coef 1, real
        coef 1, real
      coef 1, real
    coef 1, real

16.0
coef binary operation '+', real
  coef binary operation '+', real
    coef binary operation '+', real
      coef binary operation '+', real
        coef unary operation ' ', real
          coef 1, real
        coef unary operation ' ', real
          coef 1, real
      coef binary operation '+', real
        coef unary operation ' ', real
          coef 1, real
        coef unary operation ' ', real
          coef 1, real
    coef binary operation '+', real
      coef binary operation '+', real
        coef unary operation ' ', real
          coef 1, real
        coef unary operation ' ', real
          coef 1, real
      coef binary operation '+', real
        coef unary operation ' ', real
          coef 1, real
        coef unary operation ' ', real
          coef 1, real
  coef binary operation '+', real
    coef binary operation '+', real
      coef binary operation '+', real
        coef unary operation ' ', real
          coef 1, real
        coef unary operation ' ', real
          coef 1, real
      coef binary operation '+', real
        coef unary operation ' ', real
          coef 1, real
        coef unary operation ' ', real
          coef 1, real
    coef binary operation '+', real
      coef binary operation '+', real
        coef unary operation ' ', real
          coef 1, real
        coef unary operation ' ', real
          coef 1, real
      coef binary operation '+', real
        coef unary operation ' ', real
          coef 1, real
        coef unary operation ' ', real
          coef 1, real

16.0