[hpsdr] Announcement of a CudaSharpDSP package for HPSDR: doing parallel DSP processing on your GPU

Hermann hvh.net at gmail.com
Thu Apr 8 05:35:13 PDT 2010


Dear All,

this is an early announcement of a new software package for DSP
processing on your PC for HPSDR. After working a couple of months now
on a parallel implementation of the DSP functionality for HPSDR, and
after discussions and encouragement of and with George, K9TRV and
Phil, VK6APH, I can foresee now that this implementation can be
finished successfully!

The parallel implementation makes use of your Nvidia graphics adapter
(if you have one, of course). Nvidia has provided a free framework,
named Cuda, to implement parallel algorithms on GPUs
(http://www.nvidia.com/object/cuda_home_new.html). Cuda provides an
extension to C-language to be used in various environments. Cuda also
provides a library, named CuFFT, which is a parallel implementation of
FFT. On Nvidia's Cuda web pages you can find links for downloading
Cuda and all documents. If you want to know if your graphics adapter
supports Cuda, please take a look at the Cuda Programming Guide
(available here:
http://developer.nvidia.com/object/gpucomputing.html). The Cuda
Toolkit and the Cuda SDK (with very nice computing examples) can be
downloaded here:
http://developer.nvidia.com/object/cuda_3_0_downloads.html. Cuda is
available for Win (32/64 Bit), Linux and MacOS. Some have reported
that Cuda also works for ATI (now AMD) graphic adapters, but this I
havn't tried.


After receiving my Mercury board at the end of 2009, I began to
experiment with Cuda. First results were encouraging, and
implementation of the FFT parts wasn't that difficult. I could easily
implement the FFT part of the KISS Konsole, which computes the data
for visualising the whole spectrum, which already lowered the CPU
usage, so I could easily set the sample rate to 192kHz.

A major drawback was to learn, that the current CuFFT revision does
not support multithreading. You have to know that Cuda comes with two
programming interfaces, the Cuda driver API and the runtime API which
both can be used for implementation, and that CuFFT can only be used
with the runtime API, which currently does not support multithreading.

In order to continue my experiments, I had to abandon the thread of
the KISS Konsole which does all DSP processing, and run DSP-(Cuda)
processing in the main thread. Therefore I stripped off nearly all the
GUI parts of KISS, only leaving a way to enter a frequency, enter some
basic commands, turn on/off filters, and control the volume. As a side
effect, I now have a nice and little KISS Konsole (which really is
sort of a console), which could also serve as a starting point for a
new HPSDR server (I'm currently implementing that, too) for various
clients.

Then I continued implementing the rest of Phil's, N8VB, SharpDSP
package. The only big part missing now is the LMS filter (Noise
filter), which still gives me some headache, because it is hard to
parallelize. But it should be possible. Noise blanking and Average
Noise blanking and FFTs are working already (FFTW completely
abandoned), and some other minor things still have to be implemented.
On my little KISS Konsole I can now have a sample rate of 192kHz with
all noise blanking on without any audio distortions (which you can
achieve also with a good CPU, of course).

So, to make a long story a little shorter, the outcome will hopefully
be a CudaSharpDSP package to be used with HPSDR. The CPU still has
some work to do, of course, like shuffling the Ozy data directly to
the GPU (where the data is converted from 24-Bit ADC to 32-Bit float
in a parallel fashion, i.e. for every sample you have your own thread
:-) ), and also shuffle back the outcome (audio) of DSP back to Ozy.
The performance gain is  - at the current rates of data, i.e., 2048
Bytes per USB call - not very big. But this could change if we use the
upcoming Ozy II with tcp/ip or more than one receiver at the same
time. Then one can process much more data in parallel on the GPU, and
the performance gain should be considerable bigger.

Also, since I am still on a steep learning curve, the current
implementation is highly experimental and is probably far from being
optimized. It was for me very unusual to think "parallel" instead of
in "for" and "while" - loops, and the memory organization on the
graphics adapter is also a little cumbersome if you want to maximize
bandwidth.

More details on the CudaSharpDSP will be given in short, and I hope I
will be able to provide an early version of the HPSDR server with
CudaSharpDSP in short via the subversion server.

Many thanks to the developers and makers of the great HPSDR hardware,
and to the developers of the KISS Konsole, which really helped me a
lot to jump on the bandwaggon of SDR and DSP programming! Special
thanks to George, K9TRV, who answered all my emails and questions so
patiently (George, I have a couple of questions more....) !

All the best, vy 73's
Hermann

DL3HVH



More information about the Hpsdr mailing list