[hpsdr] Cascading ADC/FPGA Pairs
L. Van Warren
van at wdv.com
Fri Aug 21 23:24:22 PDT 2009
Steve -
The first part of your development reminded me of the situation that came
and went prior to graphics cards. Rob Pike, then at AT&T had just
demonstrated bit-blit technology for updating frame buffers on the user
screen, a utility we take for granted today. At a computer conference,
SIGGRAPH, there was an outcry for dual-ported video ram that would allow
update of the user display at the same time the raster image was being laid
down in RAM. Bandwidth-wise this is similar to the current discussion except
that a 32 bit (r,g,b,a) x 1024 x 1280 2-D image is replaced by a 16 bit x
n-sample 1-D chunk of spectrum. The dual-ported RAM came true then Jim Clark
built the 4x4 matrix multiplier chip. Then SGI, nVidia, you know the rest,
all before the multiply-accumulate on today's pipelined DSP's. There has
been talk afoot on several of the newsgroups about using graphics cards as
processors for SDR, but the question of whether the 2D image processing
architecture will map to 1D RF signal processing problem is an open question
- in my mind at least. The two beasts are a little different.
The use of memory-mapped displays enabled "double-buffering" an important
display technique that enabled a new image to be built before the old one
was removed. This eliminated flicker and temporal aliasing artifacts. The
SDR analog of this are large 1D ring-buffers for samples that are
continuously updated and indexed using modular arithmetic. The progression
of SDR hardware looks more than a little like the technical development
track that took place in graphics and image processing. This suggests
possible next steps.
Speaking of that, decimation creates image and aliasing problems of a
different kind. It is preferable from a sampling point of view to average
samples in a hierarchical pipeline in powers of two, from RF carrier
frequency down to the size of the "IF" FFT buffer. A major reduction in
signal to noise ratio is the consequence of trading decimation for tiered
averaging.
The last part of your development suggests to me a role for multi-core
processors, if in fact it is possible to dedicate one core to FFT, another
to signal conditioning, etc. As you have pointed out there are bandwidth
issues there, manifested by the number of pinouts. But if this can take
place on the CPU with a minimum of I/O...
Thus a new question is generated. What is the SDR analog of dual-ported
video ram?
- Van / AE5CC / wdv.com
-----Original Message-----
From: Steve Bunch [mailto:steveb_75 at ameritech.net]
Sent: Friday, August 21, 2009 9:52 PM
To: L. Van Warren
Cc: hpsdr at openhpsdr.org
Subject: Re: [hpsdr] Cascading ADC/FPGA Pairs
The maximum FPGA clock rate is a few hundred MHz, where the exact
magnitude of "few" is (somewhat exponentially) proportional to
dollars. As Graham points out, you can take in 5 (or more) samples
into the FPGA per clock. Then what? How do you get it into the
processor to do anything useful to it? There's multi-channel PCI/E,
Infiniband, proprietary chip busses intended for I/O chips, or a
memory interface. These are all challenging, but there are not many
other choices if you're trying to use an off-the-shelf 3GHz CPU chip
-- they just don't put a lot of GPIO pins on them. If you were to,
say, interface your FPGA to look like a memory DIMM, and disable
cacheing and interleaving for that bank, you might be able to get the
data in without too many clocks lost -- it would look like a cache
miss, basically. It's a hard project, but not impossible.
Any question of what a CPU can do is best answered by writing the
inner loop of the most critical function(s) -- the data capture, or
the copy to memory, or decimating it down, or whatever. If you don't
know the cost of these loops ahead of time, you have to write all of
your inner loops and figure out what they cost to run. Then you
compare that with your budget -- how many instructions do you get per
sampling interval?
A 3GHz processor executes, in round numbers, one instruction per every
couple of 1/3ns clocks, so maybe (running out of cache, with no misses
to memory, few mispredicted branches, no multi-cycle instructions, and
no ugly data long data dependency chains) 1/2 of a nanosecond per
instruction, per CPU. Some of those instructions can handle wider
operands (SIMD instructions, e.g., that can do several independent
sums in parallel), which can reduce the total number of clocks it
takes to process them IFF SIMD processing is applicable (one good
reason to take in multiple samples at once!).
So you have on the order of 400 instructions to process the 5 x 16
samples per 200ns you get from the FPGA. Those instructions have to
pick it off the hardware interface, do any processing you can afford
to do, and dispose of it (say, to a program running on one of the
other CPUs).
Oh, and by the way, what did you want to do with the data? FFT it
immediately? Run some filters? Such things are likely to take orders
of magnitude more instructions than the capture did... you are likely
going to have to first decimate it before you can afford to process
it. Getting the data into an FPGA and decimating it before sending it
to the CPU is a lot simpler, if it's adequate for the purpose in mind
(e.g., narrow-band communication and a casual view of a wider
bandwidth). Play with GNURadio for a while to get a feel for this --
it's not hard to write a fairly simple-seeming radio processing chain
that chokes your CPU trying to handle the 32MB/s that the USRP delivers.
As far as OS overhead is concerned, you can't afford it on the
critical path. So you could simply dedicate one or more processors to
the signal capture and processing, and make them unavailable to the OS
running on one or more other CPUs. This is a well-known technique --
we used it in a real-time UNIX product I worked on in the 1980's, and
you can do it today in Windows or Linux by running your code as a top-
priority kernel task with interrupts (other than your own, if you use
any) disabled.
Steve
On Aug 21, 2009, at 2:43 PM, L. Van Warren wrote:
> ***** High Performance Software Defined Radio Discussion List *****
>
> If you trace this conversation back, Graham doesn't answer the
> question,
> which is what is the maximum clock rate of an appropriate FPGA, like
> the
> Alterra Cyclone..
>
> At some point, no matter how much parallelism is in the FPGA, the
> clock rate
> of an FPGA-based system determines the maximum sampling rate of the
> signal.
>
> My thought is that if a master CPU running at 3.5 GHz can coordinate
> the
> final results of a set of slaved (and less expensive) FPGA/ADC pairs
> then it
> is the clock rate of the CPU and not the FPGA which determines the
> maximum
> sampling rate of the system.
>
> Whether this is true I don't know, but that is the question that I
> want to
> see the math for.
>
> - Van / AE5CC / wdv.com
>
>
>
>
> -----Original Message-----
> From: John Nordlund [mailto:ad5fu at earthlink.net]
> Sent: Friday, August 21, 2009 2:31 PM
> Cc: L. Van Warren; fallingstar at cauhf.org
> Subject: Re: [hpsdr] Cascading ADC/FPGA Pairs
>
> even better..
>
> Graham / KE9H wrote:
>> L.Van:
>>
>> That is the beauty of using the FPGA. For dedicated logic tasks
>> like playing "put and take" with the output of several A->Ds, the
>> FPGA is faster than a CPU, particularly one subject to continuous
>> interruptions such as when a modern OS is involved. The FPGA
>> can do multiple things in parallel, as opposed to the one thing at
>> a time, in series, that is characteristic of a CPU. Note that the
>> FPGA the oscilloscope company used is the same Cyclone-III
>> family as HPSDR uses on Mercury.
>>
>> --- Graham
>>
>> ==
>>
>> L. Van Warren wrote:
>>> ***** High Performance Software Defined Radio Discussion List *****
>>>
>>> That was a very interesting post about using multiple low cost ADCS
>>> to look
>>> like a higher rate ADC.
>>>
>>> I'm wondering if a high-end CPU, running at say 3 GHz could
>>> coordinate the
>>> traffic coming from multiple ADC/FPGA pairs.
>>>
>>>
>>>
>>>> From: alex <ajbr at btconnect.com>
>>>> To: hpsdr at openhpsdr.org
>>>> Subject: Re: [hpsdr] Cascading A/D Converters
>>>>
>>>
>>>
>>>> no i think that it would work, you divide the 1ghz into 5 so you
>>>> have 40
>>>>
>>> MHz at 72 deg phase, so > each ADC did every 5th sample
>>>
>>>
>>>> you would need a FPGA that worked at 1ghz though
>>>>
>>>>> rstasiak at sympatico.ca wrote:
>>>>>
>>>>>> ... blog which describes a process of cascading five Analog
>>>>>> Devices
>>>>>>
>>> AD8298-40 (40 MHz) dual
>>>>>> ADC's under the control of an Altera FPGA to get a 1 GHz sample
>>>>>> rate
>>>>>>
>>> system.
>>>>> 73 Alberto I2PHD
>>>>>
>>>
>>>
>>> Van / AE5CC / wdv.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> HPSDR Discussion List
>>> To post msg: hpsdr at openhpsdr.org
>>> Subscription help:
>>> http://lists.openhpsdr.org/listinfo.cgi/hpsdr-openhpsdr.org
>>> HPSDR web page: http://openhpsdr.org
>>> Archives: http://lists.openhpsdr.org/pipermail/hpsdr-openhpsdr.org/
>>>
>>>
>>
>>
>
> _______________________________________________
> HPSDR Discussion List
> To post msg: hpsdr at openhpsdr.org
> Subscription help:
http://lists.openhpsdr.org/listinfo.cgi/hpsdr-openhpsdr.org
> HPSDR web page: http://openhpsdr.org
> Archives: http://lists.openhpsdr.org/pipermail/hpsdr-openhpsdr.org/
1250922262.0
More information about the Hpsdr
mailing list