UniBone - Software

Parent Category: Projects Category: UniBone Written by Administrator

- THIS IS WORK IN PROGRESS -

Architecture

A UniBone application (emulated device, test logger, device tester, SimH) always consists of voluminuous C logic programs, plus a high speed signal processing core.

Target is to make a clean API to the signal processing PRUs, so end user only have to deal with their application in the future.

I first planned for a custom kernel module (aka device driver) to interface to the PRUs, but that's not necessary. However, you can always write a "/dev/unibus" kernel module, if you're a fan of additional software layers.

Even if PRUs work in UNIBUS-real-time, device emulationg applications will be much slower. The code is fast, but routing data through several abstraction layers can be slow. For example, if a PDP-11 writes into some emulated "disk controller register", this may trigger lot of work on the application side. Luckily UNIBUS is asynchron, we can hold the BUS in a waiting state (but not too long, else a BUS TIMEOUT appears).

Critical is the latency, between an BSU cycle which writes into some device register, and the time the user code starts operation. keywords: "UIO interrupt latency".

 

 

System environment

For the BBB, UniBone is a "cape". So it has the cape EEPROM with an id for the "cape manager" service, which activates the correct "device tree overlay". A DTO is a file which activates the PRUs (loads their device driver), and sets the correct pin multiplexing. Pinmux routes the logical PRU GPIO pins to physical BeagleBone header pins.

If the PRUs are enabled, their device driver module "uio_pruss" is loaded. It is accessed by the "prussdrv" API, which provides functions to start/stop a PRU, download code, use events and share memory. I had to try several Debian kernels, until I got it working. Now I use:

# uname -a
Linux unibone 4.9.100-bone-rt-r10 #1 PREEMPT RT Fri May 18 06:12:58 UTC 2018 armv7l GNU/Linux

However, with this kernel I lost the automatic activation of the cape device tree overlay. Could patch it quick and dirty.

And as usual under Linux: the moment you get something complicated working, G**gle hits start appearing saying "the <finally_I_got_it_working> API is now superseded by the <much_cooler_and_completly_different> mechanism ..."

In this case: don't use "UIO", the "remoteproc" interface is now standard to communicate with the PRUs. Maybe. Work on the PRUs is in progress.

 

Toolchains, toolchains

User code on the ARM is developed on a Ubuntu PC-host under Eclipse. So we need a crosscompiler, to generate ARM code on an x64 host. And remote-debugging was setup, allowing to visually debug code running on the tiny BBB.

Long time the BBB-PRUs were programmed in an assembler language, even the famous book of Derek Molloy does it. But Texas Instruments has a rich C-compiler environment, with some examples. Its really worth the additional learning time.

The PRU compiler is named "clpru". The generated C-code is written as an C-array (yes, "uint32_t code[]"), and included in the ARM user application, which loads the code into the PRUs and starts execution. So the binary of the PRU compiler is a source file in the ARM enironment. So in fact we have a double-long toolchain here.

Interaction between ARM and PRUs is over shared memory and interrupts. You have a lot to learn to set these up the right way.

Few people seem to use the PRUs (or only few are talking about it), web hit rate is very sparse. Almost no example code! There is only one TI example for each base mechanism:

  • toggle GPIOs,
  • read/write a shared memory mailbox area
  • signal and interrupt event.

I needed weeks to setup the toolchains perfectly and get examples with basic infrastructure tests running.

1st real world operation

As a first test, I wrote a plain C application, which does GPIO bit-banging with memory-mapped GPIOs (forget the system-filesystem /sys/class/gpio interface!). With memory-mapped GPIO, we can toggle GPIOs at 2.5 MHz, so the fastestes usermode GPIO access is 25 times slower than the PRUs!

UniBone was mounted in a DEC DD11-CK 4-slot backplane from a PDP-11/34, I added terminators and a 256 KB memory card.

unibone in dd11

Then  I wrote a "memory-test" software, exercising each cell in the 18-address-bit memory range with DATI and DATO read/write cycles. UniBone is bus master.

This "memory-test" in fact tests the register logic on the UniBone.

Signals ware also watched by UNIBUS signal adapter, build with a special board and a logic analyzer years ago. The final setup is impressive, could be used in a "mad scientist" movie scene:

unibone workplace

As I did not used PRU software, access to UNIBUs is quite slow: a DATI or DATO cycle needs approx. 10 microsecs this way. Even in 1969 DEC reached 1 microsec, so we really have to get faster!