[hpsdr] Beagleboard and Hawkboard development issues.

Selmeczi János jani.ha5ft at freemail.hu
Wed Jul 14 04:00:59 PDT 2010


Hi, All

In the past few month I have spent some time for studying 
how I can use one of the TI's OMAP processor for digital
signal processing with the HPSDR hardware.
I have a beagleboard and a OMAP-L138 Experimenter kit. I can run
various linux versions on these boards. Right now I work on being able
to run programs on the DSP core. I like to share you some of my findings.
I hope it would help you to start working on the topics.

Before we decide what processor and what board we use, we must
answer the question: what performance we could expect from the
various hardware. So I did some internet search, some testing and some computation.

For the ARM core I have found some performance data on the linux device drivers:

http://processors.wiki.ti.com/index.php/AM35x-OMAP35x-PSP_03.00.01.06_Feature_Performance_Guide
http://processors.wiki.ti.com/index.php/DaVinci_PSP_03.20.00.12_Device_Driver_Features_and_Performance_Guide

The first is for the OMAP3530 and the second is for the OMAP-L138 processor.
This data show, that
 - The USB host port performance is enough for read and write an USB disk with 20Mbyte/sec speed.
 - On the USB OTG port we could do ~50Mbit/sec Ethernet over USB communication. 
 - And through the OMAP-L138 Ethernet port we could do ~70Mbit/sec Ethernet communication.
 - Finally the OMAP-L138 could read and write a SATA disk with ~40Mbyte/sec speed.
 - Unfortunetly for the McBSP port and driver there is no performance data.
The performance data rae for OMAP3530 ARM core running with 720MHz clock
and the OMAP-L138 ARM core running with 300/450MHz.
So the processor load in the performance tests varies according to the clock speed.

Next I looked for the FFT performance. First I established some baseline using the fftw
bechmark on a PC. The code I used is from

http://www.fftw.org/benchfft/benchfft-3.1.tar.gz

I run the test on an HP dc5750M PC with an Athlon AX2 4600+ processor and 4Gbyte memory.
The operating system was an 32bit ubuntu 10.04.
The results for complex/inplace/forward FFT:

size	DP time (us)	SP time (us)
2k	51		24
4k	120		57
8k	371		147
16k	975		424
32k	2608		1205
64k	6604		3160

For the TI DSPLIB FFT an ARM Neon fftw3 performance I did some computation based on the following:

http://focus.ti.com/dsp/docs/dspplatformscontento.tsp?sectionId=2&familyId=1622&tabId=2431
http://gsoc2010-fftw-neon.blogspot.com/

The results for a complex/forward FFT are the following:

size	C674x SP	C674x fix	C64x+ fix	fftw3 ARM Neon
2k	72		28		16		1126
4k	144		62		36		2458
8k	335		133		77		5325
16k	670		287		165		11469
32k	1530		615		355		24576
64k	3059		1311		756		52429

- All time are in microsec.
- for the c674x I used 300MHz clock speed (hawkboard)
- for the c64x+ I used 520Mhz clock speed (beagleboard)
- for the dsp all data and program instructions are supposed to be in cache
- for the dsp the above results are the shortest execution time possible.
  In practice, the real time could be 3-10 times longer.
- For the ARM Neon it is expected a 5x speed iinprovement.

The results show, that
- A not too fast pC is faster than the DSP.
- The fix point performance of the dsp in similar to a not too fast PC.
- when the fftw3 speed on the ARM Neon will realy increase, it will be
  comparable to the dsp floting point performance.
- If we like to use a FIR filter with 4k taps with a sampling rate of 200k,
  the execution time (which is ~2.5 times the execution of an 8k FFT) 
  for the C674x SP is expected to be in the range of 2,5 - 8,5 ms.
  So even in the worst case we could run a receiver an the dsp and in the
  best case several receiver in parallel.
- If we like to use the system for a receiver server, i.e. we like to run
  5+ receiver in parallel, then we must use fixed point arithmetic. In this case
  the beaglebord has better performance due to the higher clock speed.
- The real advantage of the OMAP processor is to provide comparable fft
  performance to an Athlon processor using the fraction of the Athlon's power.
  If the power consumption is important (like a server operatink 24/7 or
  portable transceiver) it is the way to go.

I would like to use the following software architekture:
The tasks for the ARM core:
- Organizing the overall program flow
- External communication
- User interface (id any)
- Controlling the DSP core
- Communicating with programs running on the DSP core
The tasks for the DSP core
- running SDR algorithm for the requests of the program running on the ARM core.

For that purpose I could use the following software stack:
On the ARM core:
- operating system is linux. For that we have several choice:
  - Open Embedded
  - Angstrom which built on top of Open Embedded
  - Arago which is built on top of Angstrom
  - Debian
  - Ubuntu
- user interface (if any). For that we could use X-window if the LCD panel
  we like to use is supported by it. If it is not supported, we must write our
  own graphical package.
- For inter-processor communication we could use TI's DSP/BIOS LINK package.
- Our application written probable in C and probably based on ghpsdr3.
On the DSP core:
- operating system is TI1s DSP/BIOS. I have not found yet any examples for
  running code on TI1S dsp without DSP/BIOS.
- Inter-processor communication: TI DSP/BIOS LINK
- Basic DSP algoriths: TI DSPLIB c674x for floating point and
  DSPLIB C64x+ for fix point arithmetic.
- Our SDR algorithms written in C and based on dttsp.

Some notes:
- The ARM side is completely open sourced and GPL licenced.
- The DSP side is not open sourced, but could be freely distributed if it is
  for TI processors. We have the DSPLIB source code.

We have free toolchains for both the ARM and the DSP side with some restrictions.
The ARM side toolchain is available only for linux. For Windows we have an graphical
IDE for the DSP development, and the DSP development on linux uses character based toolchain.
The restriction is that in the Windows IDE the support for hardware emulators and debuggers
is restricted, and on the linux side there is no support for DSP emulators and debugging.

For the ARM development we have two choice:
- Open Embedded style development, where you develop your application as an OpenEmbedded package.
- SDK style developmen, where you develop your application outside the operating system's
  build environment, and you use SDKs for the os dependences

All the ARM toolchains based on the gnu toolchain.
You could use the toolchain generated by the OpenEmbedded build system, or
you could generate your own custom toolchain with the Angstrom1s narcissus system online:

http://www.angstrom-distribution.org/narcissus/

or you could use the toolchan from CodeSourcery

http://www.codesourcery.com/

The latest is the prefered toolchain for use with the TI1s SDKs.

The DSP side toolchain could be downloaded from TI together with the platfor SDKs
for both the ARM and for the DSP side.

http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/omap_l138/1_00/latest/index_FDS.html
http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/dvsdk/DVSDK_3_00/latest/index_FDS.html

You could have the operating system for the ARM side from the platform SDK or you could
build your own one together with the kernel:

http://www.openembedded.org/
http://www.angstrom-distribution.org/
http://arago-project.org/wiki/index.php/Main_Page

The DSP/BIOS and the DSP/BIOS LINK is included in the platform SDK. The DSPLIB library is available
from TI:

http://processors.wiki.ti.com/index.php?title=C674x_DSPLIB
http://focus.ti.com/docs/toolsw/folders/print/sprc265.html

The TI platform SDKs are for the TI development boards. Most of the stuff are in the SDK is usable for
beagleboard and for the hawkboard. You may generate your own operating system for these board using
Angstrom or Arago build system or download a prebuilt one from their website.

I hope all these stuff are usefull for you.

73, Jani HA5FT




More information about the Hpsdr mailing list