[hpsdr] Subject: Re: FFT latency

Wed Apr 8 06:41:00 PDT 2015

Alex,
thanks for the explanation as people are mixing up different areas of 
delay. But is the FFT / iFFT and processing in the frequency domain 
really the driver for relevant delay, with today's PCs?

According to http://www.fftw.org/speed/ a modern PC can do a 1024 FFT / 
iFFT in about 10 - 20 microseconds. Applying some filtering in the 
frequency domain, like spectral subtraction or multiplication with a 
transformed filter kernel will add some more microseconds.
But all very much below Peter's requirement of '20 ms is acceptable'. 
Even if the buffer size of 1024 has to be doubled to 2048 to allow 
overlap/add processing the overall delay for transform & process is in 
us rather than ms.
So where is the delay coming from? I think it is implied in the design 
of block based processing and other factors, depending on processing in 
the time or in the frequency domain.

a) Processing in time domain sample by sample
IF the samples would be available one by one (e.g directly from an ADC) 
the only delay here is to fill up the filter kernel until every new 
sample is included in the output calculation/convolution. If sampled at 
384k the delay with a 256 tap filter kernel will be below 1 ms 
(relevant, but no show stopper). But Hermes does not work this way. The 
samples are delivered block based by USB/Ethernet even if all processing 
is strictly in the time domain.

b) Processing in frequency domain block based
As shown above the transformation and processing should take a few 10 
us, but samples have to be available block based (which drives ms delay) 
to do an FFT / iFFT.  Even if these blocks get small the delay is much 
more than overall transform and processing. Unless the sample rate gets 
very high...

If I understood correctly Hermes pushes out 512 sample blocks via 
USB/Ethernet at a sample rate of 384k/192k/96k etc. (one panadapter). So 
there is no way to eliminate this kind of delay until the overall HW 
design is changed. For a very low latency receiver either the sample 
rate (of the transferred data) has to go up much more or/and the block 
sizes have to be much smaller.  This would call for a kind of streaming 
mode...

 From my experience with SDR based on C++ and LabView most delay arrives 
on the PC by audio APIs and a lot of 'side' processing to display 
spectra etc (which requires larger audio buffers to allow this).

PS: I have no clue about FPGA programming yet and did not take a close 
look into PowerSDR, so I apologize if something is wrong... but 
definitely a more flexible FPGA configuration in Hermes to allow 
changing the Ethernet packet buffer sizes and larger windows (= higher 
sample rate) would be very nice.

73 Michael DG5MK