[hpsdr] [Flexradio] PS3 support in Linux 2.6 - Cool!

Eric Blossom eb at comsec.com
Thu Dec 7 15:03:43 PST 2006


> 200 GFLOPS on  << ONE>>  of the 8 SPE.   As always,  our needs will 
> advance to meet the available resources.

Uhh, don't think so.

I'm pretty sure it goes like this (spent most of the last week on this stuff):

  3.2 GHz SPE clock
  In a single clock it can issue either 1 single-precision SIMD mult or add
      or
  In a single clock it can issue a fused single-precision SIMD mult+add

  The SIMD instruction operates on 4 single-prec floats.
  [Latency is 6 clocks, but can issue every clock]

Thus each SPU does 3.2e9 * 4 = 12.8 GFLOPS.  Or, if the only instruction
you ever execute is fused SIMD mult+add you get twice this, 25.6 GFLOPS.

FWIW, in addition to issuing the floating point simd op per clock, the
SPU can also issue another kind of instruction in its other pipe
(typically a memory op), but that's irrelevant to the FLOP calcs.

Thus the peak rate (assuming that _all_ you do is single-prec
mult+add -- love those FIRs) is 25.6 GFLOPS/SPU * 8 SPUS = 205 GFLOPS.

A more reasonable starting point (but still a stretch) is to assume
that you can issue a non-fused-mult-add SIMD instruction per
cycle and that gives you about 100 GFLOPS.  Of course, as soon as
you're not able to fully vectorize, you lose another factor of 4, so
call it 25 GFLOPS.

My gut sense is that we ought to be able to sustain something in the
25 - 50 GFLOPs range (per cell chip, not SPU) for typical SDR work
loads.  The 2-way blade of course doubles this.

Hope you didn't bet the farm on that big number ;)

Eric

 1165532623.0


More information about the Hpsdr mailing list