Welcome to the home page of General Application Processing.
Read about the creation of a Flexible System Architecture
for the next generation of computer, System-On-Chip, and multicore
Screen shot of the FSA Simulator thirteen cycles into a
CORDIC computation of sine & cosine. Input: 45 degrees.
Output: cosine is at top of P1 at level C3, sine is just below
it. Both values are .70703125 in a fixed point format with
thirteen bits to the right of the binary point.
For fun: click the pic or
for an explanation of the CORDIC computing system
More serious: scroll DOWN for the NEW Home Page
So it has been awhile since I updated this home page.
Of course there's a reason
I've read in various places that one thing you don't want to do is
rewrite a big system from scratch. "You will regret it!" I actually
believed this, in fact. It helped that examples were given - I think
Netscape was one of them. Yahoo too, possibly.
But - In my case my thinking went, "Well, my FSA Simulator isn't all
that large, and besides, I wrote 99% of it so I know it well. Shouldn't
be a problem."
It's almost two years later and I'm only now seeing the light at the
end of the tunnel. Youch!
It's not like I didn't have good reason. The old version just ran too
slowly, and I had designed mucho new instructions to implement (and I
distrusted the robustness of the original instruction processing
code). Fair enough, probably had to be reworked anyway. But - Double
Besides the time hit, the worst thing is finding out how much really
bad code you wrote in earlier versions. And you cannot stay
willfully blind to that, because even though you think you are
merely going to rewrite a small block of functions ... they call
other functions. So sometimes you have to make changes all the
way up branches into the leafy nodes. Meanwhile, time keeps on
slippin' into the future...
Brian's CNC Site
Brian's Tesla Lab
continues to welcome two additions to the site.
These areas demonstrate the hard work of
and his active mind and hands.
Orignal static taxonomy graphic from EDN Magazine
I am changing the format of this homepage to a simpler, hopefully more
readable question and answer format, similar to a column you might see
in a technical magazine. Of course in this case, the 'editor' and the
'interviewee' are both the same person. But why not?
I've also reduced the links to the bar above. The original home page is
with its greater detail, and most of the pages on the bar
contain a fuller set of links to the site.
So, let's get right into the "interview"
In 25 words or less ...
The FSA provides extremely low-level control of things that run
Bit-twiddling, in other words.
Yes. On things that run the fastest: fast transistors, fast gates,
And what makes the FSA able to keep up?
And hopefully, stay ahead! A couple of things:
Its "Standard Sequencer" is extremely small and simple. It has no
microcode; indeed, it has little control logic at all in order to
keep instruction cycle times short. The instructions themselves are
minimal, and use minimal pipelining.
About half the instructions (and the most commonly used) operate directly
out of memory. It takes one gate delay to recognize these instructions,
and then BOOM, they're on their way; or, boom, they've checked a test bit
and are already loading a response. The rest fall into a decoder that's
simple enough to keep up with the clock rate.
That covers speed, what about power?
If you're talking about power dissipation, I would expect the FSA to
primarily be used within applications that have the circuitry running
pretty hot anyway. Still, a 16 bit core should certainly use less power
than a 32 bit core, if that's a requirement.
On the other hand, if you're asking about processing power, the FSA
has a multi-path ALU available to crunch control data. One path is
based on the venerable TTL 181, yielding common add, subtract, compare,
and bitwise logic functionality. But there's more: other paths perform
proprietary operations that a standard ALU can't do, at least not
without multiple clock cycles. So both time and code size can be
reduced. I should add that the ALU also has open paths available
for custom development. Even more power can be factored in.
And, it doesn't stop there. Even a multi-path ALU can still be limited
by bus contention issues, something that in a multicore system can bog
things down very easily. So the FSA has an overlaid communication
system that bypasses the ALU altogether. I can't really go into that any
further here, except to say that two hands are usually better than one.
You wrote in your original homepage that "the FSA machine language is
its own microcode," yet you just said that there was no microcoding of
Imagine the dealer in a game of blackjack. The dealer has to pass
out cards to the players around the table, and also has to be
responsive to input from those players ("hit me"), which can
affect the order of the card distribution.
The FSA is obviously the dealer in this analogy. What makes
it different than the zillion other cores that are out there?
First, the FSA is
optimized to be the
dealer in the business.
One way to take advantage of its speed is to have a game with only
smart players. That is to say in a friendly game, the dealer might
offer some advice from time to time ("Should I stay on 16?"), but
this just slows down play while the dealer has to respond.
When the FSA is the 'dealer', the 'players' should definitely have
the primary responsibility for their own play.
To begin to stretch the blackjack analogy, it can also be said that
the FSA is also the dealer with
the most cards.
Within its instruction
set's addressing range, the architecture can pass to IP subsystems
a plethora of information over multiple paths. Again, it all comes
down to what a computer design is optimized for. Not only is the FSA
a Control Architecture, but the control is fast and ... dense.
If you want to fit a general descriptive category, you could say
it is closest to vertical microcode. The difference is, where is that
microcode aimed? It's aimed at application logic rather than at its
When I was first developing the instruction set, I had a type of indirect
instruction that I assumed I would have to "microcode," that is, have
two standard sequencers side by side and use one of them to step the
other through the completion of the instruction. That turned out to be
unnecessary. A closer look at the problem revealed that the supposed
target sequencer had enough control built in to handle the instruction
Yet the facility is still there to put one sequencer beside
another to handle more complex tasks. In fact, I like to bundle a
"four-pack" of them together to allow for easy context switching and
various kinds of threaded code. But so far, no need for one to take
over and run another.
Speaking of threaded code, I recently wrote a Forth interpreter /
compiler for the FSA. It almost seemed like cheating to have
four interoperating sequencers available to spread the dictionary,
stacks, and everything else into.
I have to say I gained a lot of respect for Chuck Moore and other
Forth pioneers who did their development on single micros with only
one memory space and limited scratch registers.
Anyway, to get back to my original point, the microinstructions are
primarily meant to be sent directly to IP subsystems, the sequencer
acting like the dealer in a game of cards.
Of course, sometimes the IP wants to talk back, so a full one quarter
of the instruction set is dedicated to responding to what comes over an
internal test bus.
And if that test bus itself gets overloaded?
Why, then just drop in more sequencers. They're small, and some IP may
not need the full ALU capability, so localize the control resources, and
buses, where they're most needed.
A lot of people may have trouble with the 16 bit size ...
I understand. "The 80's have called and they want their architecture
back." Look, this FSA core doesn't care if your data paths are 64 bits
or a thousand bits wide, or how much memory your application is addressing,
its job is to act as a traffic cop directing the bigger, wider bustle
going on around it. I mean, your head is smaller than your body, right?
I have held for a long time that sixteen bits is the very best word
size for this kind of low level control; I'm absolutely convinced of this.
Yes, if you're doing long computations or need floating point, you'll want
more bits, but to control something? If you need more than 16 bits,
you aren't thinking the problem through!
But is there any way to extend this architecture if that's what the
Well, those instructions which contain addresses have them on the left
hand side, so they could be extended indefinitely. Yet those instructions
which don't contain addresses would then contain a lot of wasted bits.
I don't like wasted bits. So while I do think it's easier to expand a
16 bit design to 32 bits than the other way around, I believe using
multiple 16 bit engines as distributed cores is a better way to go,
particularly for control purposes.
So, what applications do you see for the FSA?
Hopefully you noticed the animated .gif at the top of this section.
The FSA is meant to make more low level decisions than any other
architecture, and make them faster. Being a general purpose
machine, it could handle the whole left side of the picture out to
maybe a third, but I tried to make the graphic point to its sweet spot,
to a classification area that points to where no architecture
has gone before.
That is its general application target. If you want specifics,
I'll admit I'm not sure yet exactly what they might be.
One application I added to the Executive Summary awhile back was
Expensive custom logic ruled the
day at the time, and the FSA seemed a perfect fit.
By now there are several commercial NPUs out there, and typically,
creating software for them is the biggest bottleneck - a game in
which the FSA would now be playing catch up. One aspect of network
processing I think it would excel at even now is Deep Packet
Still, you get the idea. Potential niches are always coming along.
The core features of rapid processing, inherent parallelism,
and a small silicon (or GaAs?) footprint will always apply to high
performance and multicore design.
Throw in superior "decision processing," and you have something
that's pushing the envelope. The question is, which envelope?
How about standalone or embedded applications?
I remember the first Palm Pilot came out when I was working on an earlier
version of this architecture, I thought, "Damn, that should be MY chip
in there!" But, it wasn't yet ready for prime time.
I suppose you could say using an architecture as an on-chip core is a
form of "deep embedding," so maybe after getting its instruction set
entrenched in several such designs, the FSA could begin to gain
popularity in standalone development. That should be a few years down
the road if it happens.
Nowadays, while you could certainly make one heck of a single board
computer out of the FSA, how many such systems would it be competing
against? That niche seems pretty full to me. Yet, couple the FSA with
its somewhat quirky simulator / development system, and there might
be surprising acceptance from students and hobbyists. Time will tell.
Any final words?
Well, this may be merely an inventor's conceit, but the FSA is a
beautiful architecture. Elegance may not always win in the marketplace,
but it is certainly more enjoyable to develop within.
Bob Loy, Founder
From EDN mazazine, Issue 11, 6/11/2009
"Tee up your multiprocessing options" by Robert Cravotta