[hpsdr] Call for Comments and Discussion - OzyII

Fri Jul 24 11:19:45 PDT 2009

Henry Vredegoor wrote:
> Hi John, All,
>
> It would be nice to have a ball park figure about what we are talking about
> with respect to _REQUIRED_ bandwidth for the highest bandwidth application.
>
> Call it a kind of design specification.
>   
That's a really good point. Maybe we can find some basis for agreement. 
Here's some calculations of the processing power requirements as a 
function of bandwidth. I'm sure experts in the field can find all sorts 
of opportunities to fine tune the estimating process, but the general 
point is that it takes a lot of CPU horsepower to process an A/D sample 
and those processing requirements grows faster than linear with 
increasing sampling bandwidth.

Let's start with an architecture baseline. The general building blocks 
include
- Antenna, Feedline
- Software Defined Radio
- RF filtering to meet Nyquist criteria for A/D converter
- Wide band A/D converter running at 10-100 MHz
- FPGA or other custom DSP performing filtering and data rate reduction
- Data connection between SDR and general purpose computer (PC)
- PC
- Accepts data from SDR
- Runs a general purpose OS such as Linux or Windows
- Implements filtering, demodulation, audio, control

The softrock class SDR eliminates the DSP in the SDR but only supports 
about 96 KHz of bandwidth.

Let's talk about the performance capability of the PC running DSP algorithms
- Functions include filtering, signal selection, noise blanking, 
demodulation
- Linrad uses a series of FFTs that consume most of the processing
- Linrad consumes about 20% of CPU for 96 KHz bandwidth from a sound card
- Estimate CPU performance: 2 GHz clock, 1 Floating Point 
instruction/clock = 2G flops
- Estimate equivalent bit rate: 100 KHz clock x 16 bits/sample X 2 IQ 
sample/clock = 3.2 Mbps
- Estimate processing per interface bit or sample in the interface
- 2G FLOP/sec * 20% load / 3.2Mbps = about 120 instructions per bit
- 2G FLOP/sec * 20% load / 100k samples/sec = about 4000 instructions 
per sample

Pay attention to the number of instructions per sample, that's the most 
important number. It's a bit of a rough estimate, but it's based on a 
real example (Linrad). This means that it takes a lot of number 
crunching for each A/D sample. Worse, the amount of processing required 
grows exponentially with wider IF bandwidth.

Linrad uses a series of FFT's to break down a 96 KHz sample stream into 
SSB or CW bandwidths. The size of the FFTs depends the bandwidth of the 
incoming sample stream. For a wide band SDR, the size of the FFT will 
grow and the processing power also grows by what's called "N log(N)".

As the bandwidth of the SDR grows, the number of FFT bins require to 
break down the bandwidth into digestable chunks grows linearly (by N). 
If we assume that the first FFT breaks the IF bandwidth down into 100 Hz 
bins, it effectively samples a 100 Hz chunk of the band with each FFT. 
To reproduce the signal with an inverse FFT, the FFTs need to repeat 
fast enough to meet the Nyquist criteria for the bandwidth of the 
bin....meaning the FFT has to repeat at greater than 200 Hz plus some 
factors for overlapping and filter skirts...call it 500 Hz or 2000 usec 
for all the the FFT processing.

That give another way to estimate the CPU performance requirements. The 
FFTW web site has a nice formula, http://www.fftw.org/speed/. The simple 
formula for PC performance requirement becomes...
- M FLOPS = 5 N log2(N)/ (time per FFT in microseconds)
where time per FFT is about 2000 usec (for 500 Hz repeat rate)

In other words, the size of the FFT grows linearly with IF bandwidth, 
and the processing power required per FFT grows as N * log2(N). We need 
to include the need for 3 FFTS and 50% idle time for the OS.
-
- 100 KHz BW => 1000 bins => 3 FFT * 1/50% * 5 * 1000 * log2(1000)/ 2000 
= 150 Mflops
- 200 kHz BW => 2000 bins => 3 * 2 * 2000 * log2(2000)/2000 = 329 Mflops
- 1 MHz BW => 10, 000 bins = 2000 Mflops
- 10 MHz BW => 100, 000 = 25000 MFLOPS

The earlier estimate from Linrad CPU load data show 2 GFLOPS * 20% load 
= 200 MFLOPS for a 100 KHz bandwidth. That fairly close agreement with 
the 150 MFLOPs estimate above.

So, the processing cost of using wider bandwidth at the PC grows if the 
interface tries to carry more bandwidth. Transforming this back into 
interface speed...
- 100 KHz BW = 100 KHz sample rate, 2 channels, 16 bits per channel = 
3.2 Mbps
- 200 KHz BW = 6.4 Mbps
- 1 MHz BW = 32 Mbps
- 10 MHz BW = 320 Mbps

So let's put this data in words
- Assume you're willing to run the PC at 50% processing load
- Assume the SDR software is as efficient as Linrad
- Assume the SDR software knows how to take advantage of multicore CPUs
- Assume a fast 2009 PC has about 8000 MFLOPS peak capacity, 4000 available
- Max bandwidth about 2 MHz
- Right now, an SDR interface using a 100 Mbps Ethernet can fully load 
an expensive PC
- Gigabit ethernet speed will become required when IF bandwidths exceed 
about 2 MHz

Basically, it's a lot cheaper to put a $40 FPGA on the SDR card to 
reduce the bandwidth on the interface to the PC than it is to purchase a 
faster PC, or fine tune the SDR algorithms.

Based on some previous industry experience, processing in an FPGA (for 
those algorithms that fit in an FPGA) costs only about 3% of the cost of 
the same power implemented in a general purpose processor.

jeff, wa1hco

 1248459585.0