Problems with ngsolve.ngscuda.UnifiedVector

Hi everyone,

I compiled the latest version of ngsolve with cuda support
and tried to perform some simple calculations using
UnifiedVectors (see the screenshot below).
I start with a numpy array and use it to generate a corresponding
UnifiedVector. This correctly allocates memory on the GPU.
Then I define u2 = u + u followed by u2e = u2.Evaluate(), which
also allocates some memory on the GPU.
The next step is u2e.UpdateHost() to copy the result from GPU
to CPU. But when I print u2e, all components are 0.
Any idea what Iā€™m doing wrong?
Regards,

Martin.

Hi Martin,

I can only tell you that your script is working on my side.
How about other tests, for example

u *= 2
print (u)

Are you sure you compiled including device cap 7.5 ?
Are you able to do some debugging in the C++ code ?

Joachim

Hi Joachim,

thank you very much for your reply.
Explicitly setting the Compute Capability did the job.
By the way:

u *= 2 

was also working without CC set.
Regards,

Martin.

Dear Joachim, dear Martin,

I was wondering whether or not using GPU is faster with the complex number. I followed the first example in this tutorial. Only add complex=True to fes = H1(mesh, order=2, dirichlet=".*"). The time using GPU is even higher. I install NGSolve using pip (version 6.2.2305).
Note: without adding complex=True. Using GPU is faster.
Thank you so much in advance.

Best regards,
Vien

Hi Vien,

I ran the tutorial problem using ngsolve 6.2.2307 and

fes = H1(mesh, order=2, dirichlet=".*", complex=True)

The line

fdev = f.vec.CreateDeviceVector(copy=True)

creates the message

No device creator function, creating host vector

As far as I could check, there really is no transfer to the device
and the subsequent calculations are performed on (a single
core) of the CPU.
So my guess is, that the complex data types are not yet implemented.
But I am sure, Joachim could give you more reliable information concerning
your problem.
Regards,

Martin.

1 Like

Hi Vien,
not all code parts are gpu optimized, else code falls back to cpu and can have overhead. Focus for now is applications and methods where gpu promises significant performance gain.
Best
Christopher

1 Like