Hello, I would like to optimize Joachim’s DG code for acoustics with PML in this link
The first thing I would like to do is to define the phat and uhat variables only in PML domain, which I can do like the following:
fes_uhat = VectorL2(mesh, order=k, definedon="pmlx|pmly|pmlxy")
But this DG space on subdomain does not come with solveM option, If I run the following line,
It returns me the message that
mass matrix certainly not diagonal for element N5ngfem13ScalarDummyFEILNS_12ELEMENT_TYPEE10EEE
This seems to be a bug…
the coming nightly will not crash in your test case. Furthermore, the SolveM - function supports now also a definedon=region argument such that the dummyFEs will not be called at all. Still needs some testing.
Pls keep us informed concerning your optimizations …
To make the code a bit faster, I can only think of these two more changes:
(1) changed the element-wise boundary integration to skeleton-based edgewise integration. This gives me a little bit speed up (I was expecting more since we looping over edges now…anyway this is fine)
(2) use the compile flag to evaluate symbolicBFIs more efficiently. (This cause some trouble!!! It did give me speed-up when running in multi-threading without mpi, but it also slowed down the code when running in mpi in comparison with the code without the compile flag)
Now, I have two questions
concerning (1), is there a more efficient way to loop over edges of a subdomain? Currently, I can only do
I guess I need to flag all edges of the subdomain properly, then use a definedon flag, something like
VOL, skeleton=True, definedon=???
But I don’t know how to do it correctly… if this is too complicated, I might just stick with element_boundary formulation since the speed-up using skeleton-based formulation is quite insignificant.
concerning (2), I am a bit confused about when to use the compile flag for speeding up. I’ve read the note
But is still quite confused.
In my multi-threading test, I found .Compile(True, True) works better than .Compile(), which didn’t give me any speed up… what’s their difference, should I always stick with .Compile(True,True)? or (False, True)? (False, False)??
Then, the more serious question is that why the MPI version slowed down the code using compile flag? Should I just stick with the form without compile instead?
Concerning the compile question, attached is the code that solves a time domain wave equation
Line 42 is the flag to use .compile() or not.
I ran the following line,
mpirun -np 64 ngspy testCompiler.py
the elapsed time is about 3.0 s for cc = True, and about 1.5 for cc = False (I got slower result with compile(True, True))
I then run the following line (multi-threads with 64 cores)
the elapsed time is about 1.2 s for cc = True, and about 2.0 for cc = False
(I got faster result with compile(True, True))