Binary-Decision Programming

BD or Not BD

Binary-Decision based programming can be said to out-RISC RISC, as the choice of instructions is down to two: Jump and Output. The BD sequencer examines single bits in turn, choosing one of two memory addresses to jump to as each new bit comes in either high or low. At the new address will be yet another jump instruction - waiting for another bit to complete another jump - or an output instruction. A programmer can set up the sequencer to end at a specific output instruction based on any desired pattern in the incoming bits. Basically it's a hardware implementation of a binary tree traversal.

Binary Decision programming is easily implemented, and is a standard class of instructions included in the Flexible System Architecture™ language. At such longer word lengths, the output stage does not have to be merely one instruction, but can be the start of a block of more traditional sequenced code (sometimes called "Boolean" code).

However, the focus here is on shorter word length BD machines standing alone and (in the final example) controlling small subsections of low level hardwired logic within the FSA™.

For all these examples, we'll fit the BD program instructions into one byte divided into two nibbles. Nibble 0, on the right, will carry information about the next program address, while nibble 1 will carry instruction type and source bit info.

Two Bit Magnitude Comparator

For our first example, let's suppose there are two processes putting out two bits each. These are quantity bits; each pair representing a number encoded in binary, and implementation of a two bit magnitude comparator is desired as part of the software to control the two processes. In addition, each input bit is on its own wire, so:

input A1 (most significant bit) is carried on : wire 0 input A0 (least significant bit) is carried on : wire 1 input B1 (most significant bit) is carried on : wire 2 input B0 (least significant bit) is carried on : wire 3

By the way, the term 'wire' was chosen because it could imply an internal line laid out on chip, or an actual pin sticking out of an actual IC. Obviously, there has to be gating and busing surrounding the 10 bytes or so of local BD memory to get the I/O bits to and from the BD program. Anyway, getting back to our example, there are three wires available for output results, namely:

wire 4 carries : (process A output) < (process B output) wire 5 carries : (process A output) = (process B output) wire 6 carries : (process A output) > (process B output) that is, wire 4, when high : A < B wire 5, when high : A = B wire 6, when high : A > B (wires 4, 5 & 6 are normally low)

Since we need two of the nibble 1 bits reserved for use within the instructions, we'll have to encode the states of the input and output wires within a tiny two bit address in bit positions 5 & 4. For the input wires, the address will simply match the wire's number. For the outputs:

output address 00 : raises wire 4 high output address 01 : raises wire 5 high output address 10 : raises wire 6 high

It's probably easier to understand the reserved bits within the course of following the example, so let's look at the first line of the program. We want to look at either A1 or B1 first since a difference in these two most significant bits will allow us to ignore the two LSBs, taking us quickly to a solution. Of course, if A1 & B1 are the same, then we will have to consider A0 & B0, but we'll get to those soon enough.

a t d y X d p O 5 4 3 2 1 0 IF BRANCH r e R 0 0 0 0 0 0 1 1 0 A1 = 1 6, E.N. 1 0 1 1 0 0 0 1 1 B1 = 0 3, E.N. E.N. = Else Next

Let's decide to look at A1 first. Starting at program address 0, we code bits 5 & 4 as 00, the address of wire 0. Output wire 4 also has this encoding, but since the instruction type bit at the most significant position is here equal to 0, it tells us that we are here looking at an input wire.

Whatever signal is on that wire is Exclusive OR'd with the final bit at position six, and this is what drives the program's branching. When the Exclusive OR result of the XOR bit and the input bit is high, the encoded branch is taken, otherwise the associated program counter is incremented to the next higher addressed instruction.

So we have two possible branches; one a single increment beyond the current step, the other hard coded in the address nibble of this step's instruction. Let's look at the coding at those two possibilities:

a t d y X d p O 5 4 3 2 1 0 IF BRANCH r e R 0 0 0 0 0 0 1 1 0 A1 = 1 6, E.N. 1 0 1 1 0 0 0 1 1 B1 = 0 3, E.N. ... 6 0 0 1 0 0 0 1 1 B1 = 1 3, E.N. E.N. = Else Next

Both instructions want to look at B1, and notice how the branching is set up to jump to the same place, the instruction at address 3. This is reasonable, since the XOR high branch of these instructions is programmed to mean equality between A1 & B1, so we need to collect these two results into the start point for comparing A0 & B0:

3 0 0 0 1 1 0 0 0 A0 = 1 8, E.N.

If, on the other hand, either of the B1 choices is incremented to the instruction following it in sequence it means A1 & B1 are different and the program can fire an output wire:

2 1 0 0 0 0 0 0 0 out: A < B 0 ... 7 1 0 1 0 0 0 0 0 out: A > B 0

The "type" bit being set to 1 means these are output instructions, here addressing wires 4 & 6, respectively. On output conditions, the XOR bit is merely a modifier to the type bit (more on this soon), so no input bit is tested and the branch goes only to program address 0, thereby setting up the code at the original start point, ready to perform another magnitude comparison.

At this point the program is basically half finished. If we do have to branch to address 3 to test A0, the code from that point is virtually the same as the A1 code, just with different addresses. Here is the complete Compressed BD program:

a t d y X d p O 5 4 3 2 1 0 IF BRANCH r e R 0 0 0 0 0 0 1 1 0 A1 = 1 6, E.N. 1 0 1 1 0 0 0 1 1 B1 = 0 3, E.N. 2 1 0 0 0 0 0 0 0 out: A < B 0 3 0 0 0 1 1 0 0 0 A0 = 1 8, E.N. 4 0 0 1 1 0 0 1 0 B0 = 1 2, E.N. 5 1 0 0 1 0 0 0 0 out: A = B 0 6 0 0 1 0 0 0 1 1 B1 = 1 3, E.N. 7 1 0 1 0 0 0 0 0 out: A > B 0 8 0 1 1 1 0 1 1 1 B0 = 0 7, E.N. 9 1 0 0 1 0 0 0 0 out: A = B 0 E.N. = Else Next

Compressed BD at first isn't the easiest code to program or read. In particular, dealing with the XOR function requires some thought at each step, and some form of automation would be convenient. Although I have not followed up on it, a fair amount of research has taken place at Montreal's McGill University, both in simplifying binary decision tree diagrams to more compact trellis structures, and then in compiling optimized CBD code therefrom.

(As I mentioned in the link page - if you arrived here by way of the "technical tour" - I ultimately created a different version of BD programming, something I named "Relaxed BD," a form which I, at least find so much easier to write in that I haven't felt the need to follow up on minimization schemes. But then, I haven't yet tried to write a really big program in either format...)

One more point to make about this program is that it runs very fast, approaching one memory read cycle per bit of input even when all the bits have to be examined. And that's the great thing about BD programming: wise choice of the bit order can often, even usually, allow the program to reach an output state without having to deal with all the input bits, thereby saving processing time.

Before moving on to the next example, Compressed BD has one more trick up its sleeve. Notice the pattern 11 does not appear in the type-XOR bits 7 & 6. This pattern is read as, "Treat all available bits as output (in this case the six bits 5-0); next address has next instruction." What is interesting about this operation is that if several adjacent memory locations have this pattern, the program counter will "free run" through all of them, pumping out maximum output bits each cycle until a non-11 pattern is finally reached. So one jump can produce a huge amount of output data.

Four Bit AND Function

For the next example, let's suppose four different processes, each producing a high bit when it has finished, and we want to compute an AND function to determine when all have finished. Bit 4 will carry the output result in both systems, and we can use input addressing similar to before:

input A is carried on : wire 0 (addressed by 00) input B is carried on : wire 1 (addressed by 01) input C is carried on : wire 2 (addressed by 10) input D is carried on : wire 3 (addressed by 11)

Here's the full Compressed BD program:

a t d y X d p O 5 4 3 2 1 0 IF BRANCH r e R 0 0 1 0 0 0 1 0 1 A = 0 5, E.N. 1 0 1 0 1 0 1 0 1 B = 0 5, E.N. 2 0 1 1 0 0 1 0 1 C = 0 5, E.N. 3 0 1 1 1 0 1 0 1 D = 0 5, E.N. 4 1 0 x 1 0 0 0 0 out: done 0 5 1 0 x 0 0 0 0 0 out: more 0 E.N. = Else Next

This simple program basically polls all the processes one after another until all the input data bits are high. For this function, CBD is indeed "compressed:" with no increase in word size, this code could handle a 14 input AND function.

One Bit Full Adder

For one more simple example, let's create a one bit full adder, with input and output as such:

input X is addressed by 00 input Y is addressed by 01 input Ci (carry in) is addressed by 10 output S (sum) is bit 4 output Co (carry out) is bit 5 a t d y X d p O 5 4 3 2 1 0 IF BRANCH r e R 0 0 1 0 0 0 1 0 0 X = 1 4, E.N. 1 0 1 0 1 0 1 1 1 Y = 1 7, E.N. 2 0 1 1 0 1 0 0 1 Ci = 1 9, E.N. 3 1 0 0 0 0 1 0 1 Co = 0, S = 0 0 4 0 0 0 1 0 1 0 1 Y = 0 7, E.N. 5 0 0 1 0 0 1 1 0 Ci = 0 8, E.N. 6 1 0 1 1 0 1 0 1 Co = 1, S = 1 0 7 0 0 1 0 0 0 0 0 Ci = 0 9, E.N. 8 1 0 1 0 0 0 0 0 Co = 1, S = 0 0 9 1 0 0 1 0 0 0 0 Co = 0, S = 1 0 E.N. = Else Next

Usually, these BD programs will be started off by reception of an outside signal directed at them, loading the initial address and/or enabling a clock. However, sometimes there may be opportunities to let a BD system be more active in running itself. The AND Function code kind of does this; since the address is incrementing whenever it is not branching, that program can just run freely and be depended on to generate its output signal whenever it sees the AND pattern.

These three illustrations of BD programming hopefully give an idea of its use, as well as its potential within larger memory spaces. The focus in this section has remained on this smallest implementation partly because the examples are easy to follow, and partly because if BD can work within such a small space, it can work anywhere.

Many programming algorithms can be broken down into traversals through a binary tree, and in such cases Binary Decision based methods are fast, and have been shown to typically take up less memory than equivalent programs written in traditional machine languages. If the reader is interested in a more ambitious example, the Zsombor-Murray reference cited below contains an illustration of a BD implementation of a state machine acting as a traffic intersection controller, implemented in a 14 bit word, as well as a fuller treatment of BD's history and use. Or, for another ambitious example, continue reading...

BD in a Real Setting

Tackling something harder, or at least more interesting, I applied BD programming to an algorithm for computing binary square roots. In fact, I actually simulated it within the FSA simulator (see the marketing side of this website), and it works fine. I wrote the code in the proprietary Relaxed BD style I've referred to a few times, so the Compressed BD code below is itself untested, but should also work fine.

The algorithm itself is an adaptation of the by-hand method for decimal numbers. The binary version is simpler because a cut and fit multiplication step becomes simply multiplication by zero or one. A C language version of the algorithm is presented first for easy reference.

In the simulation, the do{} loop was removed and conditional tests based on the BD control bits inserted. Iteration then proceeded by multiple calls to the new function with a BD flag parameter.

void ComputeIntegerSquareRoot ( short arga, /* input argument register */ short *root) { short mina; short j, k; short bit = 0x8000; short tsta = 0; /* bits get shifted in from arga in pairs */ short wrka = 0; /* "working" reg; is subtracted from tsta */ short rslt = 0; /* square root result reg */ short cntn = 0; /* clock count nibble */ /* all shifts are to the left */ do{ /* two new bits get shifted in from arga to tsta */ for (k = 0; k < 2; k++) { cntn += 1; arga <<= 1; j = arga & bit; if (j) j = 1; tsta <<= 1; tsta |= j; } /* shift 1 into wrka, and (on test) 1 or 0 into rslt */ wrka <<= 1; wrka |= 1; rslt <<= 1; /* subtract to see if partial root candidate will fit */ mina = tsta - wrka; if (mina >= 0) /* test good, it fits! */ { wrka += 1; rslt |= 1; /* so rslt gets a One, not a Zero */ tsta = mina; /* and tsta must be updated */ } else wrka -= 1; cntn &= 0xF; } while (cntn != 0); *root = rslt; }

As for the BD code itself, here it is (the two numbers at the top allow the simulator to size the display of the BD memory).

# Number of bits, then number of words 8 10 # Square Root Algorithm Hardware Controller # # a t # d y X # d p O 5 4 3 2 1 0 IF / OUT BRANCH # r e R # 0 0 1 0 0 1 0 0 0 test nbl cntr 8, E.N. 1 1 1 0 0 0 0 0 1 shift 2 regs freerun 2 1 1 0 0 0 0 1 1 shift 3 regs freerun 3 1 1 0 0 0 1 0 0 subtract cmd freerun 4 1 1 0 0 1 0 0 0 shf sub rslt freerun 5 0 0 0 1 0 1 1 1 tst subtractn 7, E.N. 6 1 0 0 1 0 0 0 0 decremt wrka 0 7 1 0 1 0 0 0 0 0 incr & load 0 8 1 1 1 1 1 1 1 1 stop algorthm freerun 9 1 0 0 0 0 0 0 0 reset to addr = 0 #@! E.N. = Else Next freerun = Next

Once again, we use a two bit address at bit positions 5 & 4, although only bit 4 is truly active in this case:

input address 00 : nibble counter; 0 = 0000, 1 = else input address 01 : subtr test result; 0 = not, 1 = fits

And the output bits, which directly drive the logic, may sometimes be allowed in the address nibble:

output bits if bits 7 & 6 are coded 11: bit 0 : shift arga and tsta left by 1, with arga MSB --> tsta LSB bit 1 : shift 1 into wrka LSB bit 2 : subtract tsta - wrka bit 3 : shift inverted subtr MSB into rslt LSB all bits HIGH : stop algorithm, wait for another Start signal output bits if bits 7 & 6 are coded 10: bit 4 : decrement wrka bit 5 : increment wrka, - AND - : load subtr result into tsta

There are only two input tests the program has to worry about: the state of a 16 cycle wrap-around counter (equivalent to the while test in the C version), and whether or not the last subtraction was negative. Only at those two addresses does the program (possibly) branch rather than simply increment to the next instruction.

This is basically a proof-of-principle program, handling merely 16 bit data (although with a larger counter and bigger registers it could happily chug away for as many bits as desired). I was interested in making the process fit into an 8 bit word, but one does not have to be fanatical about fitting such an arbitrarily small space. When the FSA design effort moves beyond the demonstration phase, any implementation of square root logic will almost certainly exist within more general purpose hardware capable of computing other transcendental functions as well, such as a CORDIC engine. I have no worries about the applicability of Binary Decision programming to such a task.

Use 5 registers: argu_reg = input argument register test_reg = bits get shifted in from argu_reg in pairs work_reg = "working" reg; is subtracted from test_reg subt_reg = test_reg - work_reg is stored here rslt_reg = square root result register Clear regs then load value to be rooted in argu_reg The algorithm (all shifts are to the left unless otherwise noted): shift the 2 MSBs of argu_reg into test_reg LSBs shift a One into work_reg LSB subtract work_reg from test_reg and store in subt_reg algorithm test point: did subtraction yield a non-negative value? yes add One to work_reg shift One into rslt_reg copy subt_reg into test_reg no subtract One from work_reg shift Zero into rslt_reg Continue algorithm

*** Stopping point one (simple integer root) ***

Use of counter - the easy way to stop the algorithm once the whole number part of the root has been computed is to set up a counter equal to the word size of the processor being programmed. When the count has been reached, the root value is in rslt_reg.

If test_reg is all Zero bits at this time, the original input value was a perfect square - computation is basically done.

rslt_reg is borderline unnecessary - interestingly enough, two times that value is also in work_reg, meaning a simple right shift of work_reg would yield the integer root.

However, if a remainder is to be computed, work_reg will need to remain unchanged, or be restored, as the algorithm is continued.

rslt_reg is easily saved to some other memory location. It is not necessary to clear it to compute the remainder.

*** Stopping point two (fractional remainder) ***

The total number of bits of this algoritim is limited by the word size. Eventually the algorithm's shifting will be on the verge of moving a significant One past the MSB of either test_reg or work_reg. At this point their subtraction would be in error and the computation must end.

Use of counter - the early stop means the word count has usually not been reached. The correct remainder bit pattern is in rslt_reg, but is in the wrong place.

Justifying the remainder - rslt_reg must be shifted to the left by the incompleted value left in the count register so that the MSB will end up immediately to the right of the binary point.

rslt_reg is borderline unnecessary - once again, the remainder bit pattern is also in the work_reg, merely shifted one bit more to the right than rslt_reg.

_____________

References:

Zsombor-Murray, et al, "Binary-Decision-Based Programmable Controllers," Three part series in IEEE Micro, beginning in August 1983

R. T. Boute, "The Binary-Decision Machine as Programmable Controller," Euromicro Newsletter, Vol 1, No. 2, 1976, pp. 16-22.