|
|
Viva La Difference
Such a difference between human and computer language!
"In the beginning," we humans talked to our digital machines in their
own language. It was hard on the eyes.
Here are 9 lines of machine code that add the integers 3 and 5:
0011010001000111
0000010011100011
0000000000000011
0011101111100011
0000000000000101
0011011111100011
0000000010010000
0000010011110011
0001000000001100
It's all binary ONEs & ZEROs, sixteen at a time. 144 characters
altogether in a readable format (larger than a tweet!),
merely to add two small numbers.
Way back in the late 1940's, during the creation of the first
computers, and again in the early '80's, during the rollout of
the first hobby microcomputers, this is how we talked to them,
typically using a row of switches on the front panel: set one
"word," push a LOAD button to put the word into memory, and go
on to the next one. Try not to make any mistakes!
Not fun. It wasn't long before people figured out how to connect
a typewriter keyboard to the computer and bypass those panel
switches. Not to mention a printer so they could look at what
was stored (and find their inevitable mistakes).
Maybe the very first Input/Output systems still talked in binary,
but when your eyes get blurry and your fingers get cramped, you
start looking for shortcuts.
Here's the same code written in hexadecimal, a base 16 format
rather than the binary base 2:
3447
04e3
0003
3be3
0005
37e3
0090
04f3
100c
Well, at least it takes less typing ... Interestingly enough, this
format is still commonly used to display downloads from memory
during low level debugging sessions. But it didn't take humans
very long to start working on better, clearer ways to talk to the
darn things.
Here's the same code in a somewhat more readable format:
# add two integers and store result in reg # 4
03101013 # direct set choosing alu181 path
00103203 # IMMED into register # 4
00000003 # data - binary 3
03233203 # IMMED to BPP_ALU_REG
00000011 # data - binary 5
03133203 # IMMED to ALU_CTRL_REG
00002100 # instruction - load PLUS-no-carry-in encoding
00103303 # CYCLE from-to register # 4
01000030 # reset alu181 path (load 'straight' path)
The big change here is the addition of comments. At least now one
can follow the flow and the reasoning behind it, and also one can
tell the difference between instructions and data. The formatting
can be set to be pleasing to human eyes, and a translating
program can ignore "whitespace" and comments and still get the
proper binary numbers into the system memory.
This is actual native machine code which can run on my FSA simulator.
This code is in yet another base, base 4. This came about kind of
by accident, as my architecture turns out to have quite a number
of inner fields that are two bits wide.
So for this system it is clearer to use base 4 instead of binary
or hex. It's actually not too bad ... well, truly, after writing
several thousands of lines of code for the FSA, I have to say it's
pretty bad. Not terrible, but very tedious.
Can we come up with a translator that can actually speak English!?
Actual, conversational English? Not at all an easy problem.
However, just to keep from having to keep typing in
computer-friendly numbers on every single line, the next step is to
create a more user-friendly "assembly language." I've made up a fake
one here for demonstration purposes:
MOV 3 @4 # store integer 3 into register #4
MOV 5 @5 # store integer 5 into register #5
ADD @4 @5 # add registers #4 & #5 into accumulator
STOR AC @4 # copy accumulator into register #4
Definitely more readable - no numbers except for common decimal ones!
This program would run on a computer slightly different than mine,
which doesn't use an accumulator, for example, and I like to use
registers as minimally as possible.
Still, an intermediate program would translate from this assembly
language to the machine code of the target computer. Basically,
these four lines would be equivalent to the innermost 7 lines of
the FSA program.
Here's a similar fake assembly language program doing the same
operation, but this version is stack oriented rather than
register oriented:
PUSH 3 # stack integer 3
PUSH 5 # stack integer 5
ADD STAK # add top two numbers on stack (& put result on same stack)
POP @4 # move stack result to register #4
And what's a "stack"? Imagine a stack of cafeteria trays, and
imagine adding two more trays to the top, one with a big "3" written
on it and one with a big "5." Now turn that concept into a stack of
memory cells somewhere in the computer. When the numbers are added,
they are consumed (or "popped") off the stack, and replaced by the
new number. So trays 3 & 5 get replaced by a tray with a big "8" and
the stack is one tray shorter. Then the sum is moved elsewhere and the
stack is back to its original height.
Each assembly language segment is about as readable as the
other. I'll let the reader decide how ultimately readable hundreds
or thousands or even millions of lines like these would be.
So what would be
WAY
more readable? How about something like:
somevar = 3 + 5
Or at least:
3 5 +
So
those
are in English, sort of.
The first one is in some kind of higher level language with a named
variable storing the result. The second is more suitable for
translation into a stack based machine. In each case, one terse line
substitutes for multiple lines of native code or assembler.
For each conceptual step we've taken so far, the translation program
has to do more. The first three examples merely have to translate
a line of text into a binary value to be slotted into the machine's
memory. Basically the front panel switches have been turned into
those lines of text. The third example isn't really that much more
complicated than the first two, the translator simply has to learn
to ignore stuff that isn't a value destined for the machine.
The assembly language translators have to be more sophisticated.
English words like MOV and ADD have to be read in and worked over
to become one or more lines of machine code before that code can
be loaded into memory. Numbers with an @ sign before them are very
different than numbers standing alone. In other words, the translator
has to deal with a richer "English" input format.
The last two examples are even more terse, and require an even more
advanced program. There are two basic kinds of such translators,
"interpreters" and "compilers."
An interpreter typically takes a single line of the typed input and
turns it into machine code that immediately runs. So for the line:
"somevar = 3 + 5"
the interpreter has to figure out that somevar is a variable
for storing the result, and that the + sign is ultimately an
ADD instruction.
Chances are there's a formalism enforced on the English "source
code" containing such lines to help the interpreter figure things
out when it gets to them. The interpreter might need a
"declaration" line for somevar something like:
variable integer = somevar
A complete program which would display the result might be:
variable integer = somevar
somevar = 3 + 5
print somevar
These statements could be typed in one at a time, or could be
saved in a "source file" which would be read into the interpreter.
Note that the words "variable" "integer" and "print" would have to
be reserved for the interpreter's use. You couldn't get away with
"variable = 3 + 5", for example, it would confuse things. No doubt
print an error.
The next step up is the compiler. It takes all the lines in a
source file, crunches on them for awhile, and produces a finished
program with a unique name. The above three lines might be compiled
into "myprogram.exe" for example, and a compiling session might look
something like this:
> compile mysource myprogram
> run myprogram
> 8
Both the interpreter and compiler produce machine code. The
difference is that the compiler, since it operates on a whole
source file rather than on one line at a time, can access a
bag of tricks to make the program run much faster.
On the other hand, the interpreter can be more user friendly,
giving the results of small source changes more quickly, and
letting you know immediately that "variable = 3 + 5" is an illegal
statement, allowing you to debug as you code each line.
Compilers in particular may produce both machine & assembly code,
and may even add simple comments to the assembler output.
We've come quite a way towards something human-friendly, even if
it is not truly conversational English. Typing (or speaking!) an
input like:
"Please add three and five and send the result to my screen."
would be possible to process today, but having a really
LONG conversation turned into a working program ain't gonna
happen - not yet, anyway.
|
|