[hpsdr] Blackfin 32*32 bits multiply

Tue Aug 21 18:21:26 PDT 2007

> On 21/08/2007, Chris Donzelot wrote:
> 
> > There is some details in the Blackfin Hardware
> reference.
> > The 32bits mpy is microcoded
> > The dual MAC operation only allow doubling MAC
> throughput , you can
> > input two 32 bits scalar into both MACs to double
> throughput.
> > One of both MAC have a shifter just after
> accumulator, and the MAC
> > output can be a standard  32 bits register.
> > The  32 bits  MAC will be slower than 300 MMAC/s @
> 600 MHz (4 16x16 MAC
> > + shift & 32 bits addition).
> >
> 
> I am not an expert in the Blackfin, however I have
> been reading up on it.
> Some Blackfin instructions can be performed in
> parallel thereby reducing the
> number of cycles.
> 
> If you look at Analog Devices, Engineer to Engineer
> Note EE-186
>
(http://www.analog.com/UploadedFiles/Application_Notes/52064380701163EE186.p
> df ), you will see that it is possible  to do a 32 x
> 32 multiply with a 31
> bit accuracy in 2 instructions cycles.
> 
> Also in the same document are examples of 32 bit and
> 31 bit accurate FIR
> implementations with a calculation for the number of
> cycles. This may help
> estimate the relative performance of the Blackfin
> against a Pentium. I do
> not know the equivalent number of cycles for Pentium
> type processor, perhaps
> some else could point to a source for these.
> 
> Of course the time taken to carry out a 32 bit
> multiply is only part of the
> story. Perhaps a better solution would be to choose
> a particular dsp
> function or set of functions required and do a
> direct comparison.
> 
> Chris Down
> G8MXW
> 

A 32x32 bit multiply produces a 64-bit result. In the
case of the Blackfin 32x32 bit multiply (I am
referring to the built-in 32x32 bit multiply, not one
that you would write yourself as a macro) the result
can only be saved to a 32-bit register. This means
that you would need to scale down either the
coefficients or the data in order to prevent overflow
on storing the multiplier result, which really defeats
the purpose of having a 32x32 bit multiply.

Most integer DSPs have accumulators that are even
larger than the multiply result, which allows the
intermediate sum-of-products in an FIR filter, for
example, to grow larger than the size of the multiply
result (64-bits in the case of a 32x32 bit multiply). 
The final FIR result may be within range of the size
of the multiplier output width, while the intermediate
sum could have grown larger than this width. The extra
accumulator width prevents saturation from occurring
during the sum-of-products. The Blackfin does indeed
do this for its 16x16 bit multiply by providing a
40-bit accumulator rather than a 32-bit accumulator
(recall the Motorola DSP56000 with its 24x24
multiplier and 56-bit accumulator).

A 32x32 bit multiply with a 32-bit result is only
useful if you know ahead of time that either your data
or your coefficients will not be 32 bits. When using
24-bit data converters, you would have to use 8-bit
FIR coefficients in order to guarantee no overflow of
the multiplier! I really think that application note
EE-186 is wishful thinking on ADI's part in order to
try to sell its Blackfin processor to audio folks.

The Blackfin looks like a great 16-bit DSP, but for
high resolution audio work (or baseband SDR using
24-bit converters) I think a better choice would be a
DSP that has native 32-bit arithmetic. 

Greg
WD9DEX

 1187745686.0