[hpsdr] Announcement of a CudaSharpDSP package for HPSDR: doing parallel DSP processing on your GPU

Thu Apr 8 13:42:53 PDT 2010

On Apr 8, 2010, at 4:25 PM, Hermann wrote:

> Thanks, Jeremy! In one of the Teamspeak sessions I heard about your
> OpenCL implementation. I am also very interested in this, since OpenCL
> is able to run on the Cuda architecture and is also platform
> independent!

My "implementation" isn't very far yet.  I have some code that finds available OpenCL devices and enumerates them.  I'm still getting my head around how to parallelize some of this stuff, and I'm hoping that maybe looking at some of your ideas will spur some of my own.

I'm not familiar with CUDA, so I can't speak a lot to it.  The nice things about OpenCL are:
1)  Cross platform with regards to both graphics card manufacturers, and operating systems.  It's an open spec maintained by the same folks that do OpenGL.
2)  The compute units don't have to be graphics cards.  In fact, Apple recommends developing on the main processor first, because the debugging tools are a lot better.  When you deploy it though, you can have both the graphics card and the CPU doing pieces of the FFT (or whatever else you're doing).  IBM has OpenCL running on some of their Cell hardware[1] and there is apparently the idea of having a Cell based accelerator card for such computing.
3)  If you set up your buffers correctly, you can actually pass them between OpenCL and OpenGL.  So, for the panadapter and waterfall, you can transfer up the raw data from Ozy, process it on the graphics card, and display it without ever leaving the graphics card memory.

But, honestly, my first priority is to use the vDSP[2] calls that Apple includes in MacOS to implement the DSP functions first.  The reason is that our datasets are fairly small in block size as compared to the large 3D datasets that CUDA and OpenCL imagine.  We're on the "small" side of what the designers envisioned.  The data only gets huge when you look over time, but if you start implementing large block sizes, you start introducing excessive latency.  So, I'm interested in testing out how much faster Apple's OpenCL FFT implementation[3] is than the vDSP implementation in MacOS is on our size of datasets, if at all.

[1] http://www.alphaworks.ibm.com/tech/opencl
[2] http://developer.apple.com/Mac/library/documentation/Performance/Conceptual/vDSP_Programming_Guide/Introduction/Introduction.html#//apple_ref/doc/uid/TP40005147

[3] http://developer.apple.com/mac/library/samplecode/OpenCL_FFT/Introduction/Intro.html

> 
> That's great!
> 
> 
> vy 73, Hermann
> DL3HVH
> 
> 
> 
> On Thu, Apr 8, 2010 at 6:31 PM, Jeremy McDermond <mcdermj at xenotropic.com> wrote:
>> This is great to hear Hermann.  I'm looking at doing some similar things
>> with OpenCL, and having some existing code out there will definitely be of
>> assistance.
>> 
>> --
>> Jeremy McDermond (NH6Z)
>> Xenotropic Systems
>> mcdermj at xenotropic.com
>> 
>> 
>> 
>> 
> 

--
Jeremy McDermond (NH6Z)
Xenotropic Systems
mcdermj at xenotropic.com

 1270759373.0