Hello,

I am trying to speed up my HDG Naver Stokes solver and need some help on technical details.

In my 2D lid driven cavity problem, I am using a sparse cholesky factorization to solve the linear system,

and all integration terms are already appended with the command `.Compile(True,True)`

.

I found that for a particular problem (k=2 on 128 X 128 grid), the cost of residual update is about twice more expensive than the linear system solver. So I am looking into alternative ways to speed up the residual updates. The residual term contains the nonlinear convection and linear viscosity operators.

I tried two ways for the viscosity operator updates:

```
visc = LinearForm(V)
visc += Grad(uh)*Grad(v)*dx
...
visc.Assemble()
```

and

```
visc =BinearForm(V, nonassemble=True)
visc += Grad(u)*Grad(v)*dx
...
res = visc.mat*uh.vec
```

Surprisingly, the second BilinearForm approach is about 4 times faster than the LinearForm approach. Why is this the case? I thought the two approaches shall be the same (is it faster to take derivatives of proxi functions than grid functions??).

Also, attached is the snippet for my residuals:

```
#### diffusion update: very expensive
visc = BilinearForm(fes, nonassemble=True)
visc += (-nu*InnerProduct(gradu,gradv)).Compile(True,True)*dx
visc += (nu*(gradu*n*tang(v-vhat)+gradv*n*tang(u-uhat)
-penalty*tang(u-uhat)*tang(v-vhat))).Compile(True,True)*dx(element_boundary=True)
conv = BilinearForm(fes, nonassemble=True)
conv += (-InnerProduct(Grad(u)*u, v)).Compile(True,True)*dx
vminus = IfPos(u*n, v.Other(), v)
conv += u2*n*(u-u.Other())*vminus.Compile(True,True)*dx(skeleton=True)
....
res.data = visc.mat*gfu.vec + conv.mat*gfu.vec
```

Are there immediate ways to further speed up this step? Thank you.

Best,

Guosheng