[hpsdr] Cascading ADC/FPGA Pairs

Fri Aug 21 23:24:22 PDT 2009

Steve -

The first part of your development reminded me of the situation that came
and went prior to graphics cards. Rob Pike, then at AT&T had just
demonstrated bit-blit technology for updating frame buffers on the user
screen, a utility we take for granted today. At a computer conference,
SIGGRAPH, there was an outcry for dual-ported video ram that would allow
update of the user display at the same time the raster image was being laid
down in RAM. Bandwidth-wise this is similar to the current discussion except
that a 32 bit (r,g,b,a) x 1024 x 1280 2-D image is replaced by a 16 bit x
n-sample 1-D chunk of spectrum. The dual-ported RAM came true then Jim Clark
built the 4x4 matrix multiplier chip. Then SGI, nVidia, you know the rest,
all before the multiply-accumulate on today's pipelined DSP's. There has
been talk afoot on several of the newsgroups about using graphics cards as
processors for SDR, but the question of whether the 2D image processing
architecture will map to 1D RF signal processing problem is an open question
- in my mind at least. The two beasts are a little different.

The use of memory-mapped displays enabled "double-buffering" an important
display technique that enabled a new image to be built before the old one
was removed. This eliminated flicker and temporal aliasing artifacts. The
SDR analog of this are large 1D ring-buffers for samples that are
continuously updated and indexed using modular arithmetic. The progression
of SDR hardware looks more than a little like the  technical development
track that took place in graphics and image processing. This suggests
possible next steps.

Speaking of that, decimation creates image and aliasing problems of a
different kind. It is preferable from a sampling point of view to average
samples in a hierarchical pipeline in powers of two, from RF carrier
frequency down to the size of the "IF" FFT buffer. A major reduction in
signal to noise ratio is the consequence of trading decimation for tiered
averaging.

The last part of your development suggests to me a role for multi-core
processors, if in fact it is possible to dedicate one core to FFT, another
to signal conditioning, etc. As you have pointed out there are bandwidth
issues there, manifested by the number of pinouts. But if this can take
place on the CPU with a minimum of I/O...

Thus a new question is generated. What is the SDR analog of dual-ported
video ram?

- Van / AE5CC / wdv.com

-----Original Message-----
From: Steve Bunch [mailto:steveb_75 at ameritech.net] 
Sent: Friday, August 21, 2009 9:52 PM
To: L. Van Warren
Cc: hpsdr at openhpsdr.org
Subject: Re: [hpsdr] Cascading ADC/FPGA Pairs

The maximum FPGA clock rate is a few hundred MHz, where the exact  
magnitude of "few" is (somewhat exponentially) proportional to  
dollars.  As Graham points out, you can take in 5 (or more) samples  
into the FPGA per clock.  Then what?  How do you get it into the  
processor to do anything useful to it?  There's multi-channel PCI/E,  
Infiniband, proprietary chip busses intended for I/O chips, or a  
memory interface.  These are all challenging, but there are not many  
other choices if you're trying to use an off-the-shelf 3GHz CPU chip  
-- they just don't put a lot of GPIO pins on them.  If you were to,  
say, interface your FPGA to look like a memory DIMM, and disable  
cacheing and interleaving for that bank, you might be able to get the  
data in without too many clocks lost -- it would look like a cache  
miss, basically.  It's a hard project, but not impossible.

Any question of what a CPU can do is best answered by writing the  
inner loop of the most critical function(s) -- the data capture, or  
the copy to memory, or decimating it down, or whatever.  If you don't  
know the cost of these loops ahead of time, you have to write all of  
your inner loops and figure out what they cost to run.  Then you  
compare that with your budget -- how many instructions do you get per  
sampling interval?

A 3GHz processor executes, in round numbers, one instruction per every  
couple of 1/3ns clocks, so maybe (running out of cache, with no misses  
to memory, few mispredicted branches, no multi-cycle instructions, and  
no ugly data long data dependency chains) 1/2 of a nanosecond per  
instruction, per CPU.  Some of those instructions can handle wider  
operands (SIMD instructions, e.g., that can do several independent  
sums in parallel), which can reduce the total number of clocks it  
takes to process them IFF SIMD processing is applicable (one good  
reason to take in multiple samples at once!).

So you have on the order of 400 instructions to process the 5 x 16  
samples per 200ns you get from the FPGA.   Those instructions have to  
pick it off the hardware interface, do any processing you can afford  
to do, and dispose of it (say, to a program running on one of the  
other CPUs).

Oh, and by the way, what did you want to do with the data?  FFT it  
immediately?  Run some filters?  Such things are likely to take orders  
of magnitude more instructions than the capture did... you are likely  
going to have to first decimate it before you can afford to process  
it.  Getting the data into an FPGA and decimating it before sending it  
to the CPU is a lot simpler, if it's adequate for the purpose in mind  
(e.g., narrow-band communication and a casual view of a wider  
bandwidth).  Play with GNURadio for a while to get a feel for this --  
it's not hard to write a fairly simple-seeming radio processing chain  
that chokes your CPU trying to handle the 32MB/s that the USRP delivers.

As far as OS overhead is concerned, you can't afford it on the  
critical path.  So you could simply dedicate one or more processors to  
the signal capture and processing, and make them unavailable to the OS  
running on one or more other CPUs.  This is a well-known technique --  
we used it in a real-time UNIX product I worked on in the 1980's, and  
you can do it today in Windows or Linux by running your code as a top- 
priority kernel task with interrupts (other than your own, if you use  
any) disabled.

Steve

On Aug 21, 2009, at 2:43 PM, L. Van Warren wrote:

> ***** High Performance Software Defined Radio Discussion List *****
>
> If you trace this conversation back, Graham doesn't answer the  
> question,
> which is what is the maximum clock rate of an appropriate FPGA, like  
> the
> Alterra Cyclone..
>
> At some point, no matter how much parallelism is in the FPGA, the  
> clock rate
> of an FPGA-based system determines the maximum sampling rate of the  
> signal.
>
> My thought is that if a master CPU running at 3.5 GHz can coordinate  
> the
> final results of a set of slaved (and less expensive) FPGA/ADC pairs  
> then it
> is the clock rate of the CPU and not the FPGA which determines the  
> maximum
> sampling rate of the system.
>
> Whether this is true I don't know, but that is the question that I  
> want to
> see the math for.
>
> - Van / AE5CC / wdv.com
>
>
>
>
> -----Original Message-----
> From: John Nordlund [mailto:ad5fu at earthlink.net]
> Sent: Friday, August 21, 2009 2:31 PM
> Cc: L. Van Warren; fallingstar at cauhf.org
> Subject: Re: [hpsdr] Cascading ADC/FPGA Pairs
>
> even better..
>
> Graham / KE9H wrote:
>> L.Van:
>>
>> That is the beauty of using the FPGA.  For dedicated logic tasks
>> like playing "put and take" with the output of several A->Ds, the
>> FPGA is faster than a CPU, particularly one subject to continuous
>> interruptions such as when a modern OS is involved.  The FPGA
>> can do multiple things in parallel, as opposed to the one thing at
>> a time, in series, that is characteristic of a CPU.  Note that the
>> FPGA the oscilloscope company used is the same Cyclone-III
>> family as HPSDR uses on Mercury.
>>
>> --- Graham
>>
>> ==
>>
>> L. Van Warren wrote:
>>> ***** High Performance Software Defined Radio Discussion List *****
>>>
>>> That was a very interesting post about using multiple low cost ADCS
>>> to look
>>> like a higher rate ADC.
>>>
>>> I'm wondering if a high-end CPU, running at say 3 GHz could
>>> coordinate the
>>> traffic coming from multiple ADC/FPGA pairs.
>>>
>>>
>>>
>>>> From: alex <ajbr at btconnect.com>
>>>> To: hpsdr at openhpsdr.org
>>>> Subject: Re: [hpsdr] Cascading A/D Converters
>>>>
>>>
>>>
>>>> no i think that it would work, you divide the 1ghz into 5 so you
>>>> have 40
>>>>
>>> MHz at 72 deg phase, so > each ADC did every 5th sample
>>>
>>>
>>>> you would need a FPGA that worked at 1ghz though
>>>>
>>>>> rstasiak at sympatico.ca wrote:
>>>>>
>>>>>> ... blog which describes a process of cascading five Analog  
>>>>>> Devices
>>>>>>
>>> AD8298-40 (40 MHz) dual
>>>>>> ADC's under the control of an Altera FPGA to get a 1 GHz sample  
>>>>>> rate
>>>>>>
>>> system.
>>>>> 73  Alberto  I2PHD
>>>>>
>>>
>>>
>>> Van / AE5CC / wdv.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> HPSDR Discussion List
>>> To post msg: hpsdr at openhpsdr.org
>>> Subscription help:
>>> http://lists.openhpsdr.org/listinfo.cgi/hpsdr-openhpsdr.org
>>> HPSDR web page: http://openhpsdr.org
>>> Archives: http://lists.openhpsdr.org/pipermail/hpsdr-openhpsdr.org/
>>>
>>>
>>
>>
>
> _______________________________________________
> HPSDR Discussion List
> To post msg: hpsdr at openhpsdr.org
> Subscription help:
http://lists.openhpsdr.org/listinfo.cgi/hpsdr-openhpsdr.org
> HPSDR web page: http://openhpsdr.org
> Archives: http://lists.openhpsdr.org/pipermail/hpsdr-openhpsdr.org/

 1250922262.0