Viva La Difference

Such a difference between human and computer language!

"In the beginning," we humans talked to our digital machines in their own language. It was hard on the eyes.

Here are 9 lines of machine code that add the integers 3 and 5:

0011010001000111
0000010011100011
0000000000000011
0011101111100011
0000000000000101
0011011111100011
0000000010010000
0000010011110011
0001000000001100

It's all binary ONEs & ZEROs, sixteen at a time. 144 characters altogether in a readable format (larger than a tweet!), merely to add two small numbers.

Way back in the late 1940's, during the creation of the first computers, and again in the early '80's, during the rollout of the first hobby microcomputers, this is how we talked to them, typically using a row of switches on the front panel: set one "word," push a LOAD button to put the word into memory, and go on to the next one. Try not to make any mistakes!

Not fun. It wasn't long before people figured out how to connect a typewriter keyboard to the computer and bypass those panel switches. Not to mention a printer so they could look at what was stored (and find their inevitable mistakes).

Maybe the very first Input/Output systems still talked in binary, but when your eyes get blurry and your fingers get cramped, you start looking for shortcuts.

Here's the same code written in hexadecimal, a base 16 format rather than the binary base 2:

3447
04e3
0003
3be3
0005
37e3
0090
04f3
100c

Well, at least it takes less typing ... Interestingly enough, this format is still commonly used to display downloads from memory during low level debugging sessions. But it didn't take humans very long to start working on better, clearer ways to talk to the darn things.

Here's the same code in a somewhat more readable format:

                  # add two integers and store result in reg # 4
        03101013  #   direct set choosing alu181 path

        00103203  #   IMMED into register # 4
        00000003  #   data - binary 3

        03233203  #   IMMED to BPP_ALU_REG
        00000011  #   data - binary 5

        03133203  #   IMMED to ALU_CTRL_REG
        00002100  #   instruction - load PLUS-no-carry-in encoding
        00103303  #   CYCLE from-to register # 4

        01000030  #   reset alu181 path (load 'straight' path)

The big change here is the addition of comments. At least now one can follow the flow and the reasoning behind it, and also one can tell the difference between instructions and data. The formatting can be set to be pleasing to human eyes, and a translating program can ignore "whitespace" and comments and still get the proper binary numbers into the system memory.

This is actual native machine code which can run on my FSA simulator. This code is in yet another base, base 4. This came about kind of by accident, as my architecture turns out to have quite a number of inner fields that are two bits wide.

So for this system it is clearer to use base 4 instead of binary or hex. It's actually not too bad ... well, truly, after writing several thousands of lines of code for the FSA, I have to say it's pretty bad. Not terrible, but very tedious.

Can we come up with a translator that can actually speak English!? Actual, conversational English? Not at all an easy problem.

However, just to keep from having to keep typing in computer-friendly numbers on every single line, the next step is to create a more user-friendly "assembly language." I've made up a fake one here for demonstration purposes:

MOV   3 @4   # store integer 3 into register #4
MOV   5 @5   # store integer 5 into register #5
ADD  @4 @5   # add registers #4 & #5 into accumulator
STOR AC @4   # copy accumulator into register #4

Definitely more readable - no numbers except for common decimal ones! This program would run on a computer slightly different than mine, which doesn't use an accumulator, for example, and I like to use registers as minimally as possible.

Still, an intermediate program would translate from this assembly language to the machine code of the target computer. Basically, these four lines would be equivalent to the innermost 7 lines of the FSA program.

Here's a similar fake assembly language program doing the same operation, but this version is stack oriented rather than register oriented:

PUSH 3     # stack integer 3
PUSH 5     # stack integer 5
ADD STAK   # add top two numbers on stack (& put result on same stack)
POP @4     # move stack result to register #4

And what's a "stack"? Imagine a stack of cafeteria trays, and imagine adding two more trays to the top, one with a big "3" written on it and one with a big "5." Now turn that concept into a stack of memory cells somewhere in the computer. When the numbers are added, they are consumed (or "popped") off the stack, and replaced by the new number. So trays 3 & 5 get replaced by a tray with a big "8" and the stack is one tray shorter. Then the sum is moved elsewhere and the stack is back to its original height.

Each assembly language segment is about as readable as the other. I'll let the reader decide how ultimately readable hundreds or thousands or even millions of lines like these would be.

So what would be WAY more readable? How about something like:

somevar = 3 + 5

Or at least:

3 5 +

So those are in English, sort of.

The first one is in some kind of higher level language with a named variable storing the result. The second is more suitable for translation into a stack based machine. In each case, one terse line substitutes for multiple lines of native code or assembler.

For each conceptual step we've taken so far, the translation program has to do more. The first three examples merely have to translate a line of text into a binary value to be slotted into the machine's memory. Basically the front panel switches have been turned into those lines of text. The third example isn't really that much more complicated than the first two, the translator simply has to learn to ignore stuff that isn't a value destined for the machine.

The assembly language translators have to be more sophisticated. English words like MOV and ADD have to be read in and worked over to become one or more lines of machine code before that code can be loaded into memory. Numbers with an @ sign before them are very different than numbers standing alone. In other words, the translator has to deal with a richer "English" input format.

The last two examples are even more terse, and require an even more advanced program. There are two basic kinds of such translators, "interpreters" and "compilers."

An interpreter typically takes a single line of the typed input and turns it into machine code that immediately runs. So for the line: "somevar = 3 + 5" the interpreter has to figure out that somevar is a variable for storing the result, and that the + sign is ultimately an ADD instruction.

Chances are there's a formalism enforced on the English "source code" containing such lines to help the interpreter figure things out when it gets to them. The interpreter might need a "declaration" line for somevar something like:

variable integer = somevar

A complete program which would display the result might be:

variable integer = somevar
somevar = 3 + 5
print somevar

These statements could be typed in one at a time, or could be saved in a "source file" which would be read into the interpreter. Note that the words "variable" "integer" and "print" would have to be reserved for the interpreter's use. You couldn't get away with "variable = 3 + 5", for example, it would confuse things. No doubt print an error.

The next step up is the compiler. It takes all the lines in a source file, crunches on them for awhile, and produces a finished program with a unique name. The above three lines might be compiled into "myprogram.exe" for example, and a compiling session might look something like this:

> compile mysource myprogram
> run myprogram
> 8

Both the interpreter and compiler produce machine code. The difference is that the compiler, since it operates on a whole source file rather than on one line at a time, can access a bag of tricks to make the program run much faster.

On the other hand, the interpreter can be more user friendly, giving the results of small source changes more quickly, and letting you know immediately that "variable = 3 + 5" is an illegal statement, allowing you to debug as you code each line.

Compilers in particular may produce both machine & assembly code, and may even add simple comments to the assembler output.

We've come quite a way towards something human-friendly, even if it is not truly conversational English. Typing (or speaking!) an input like:

"Please add three and five and send the result to my screen."

would be possible to process today, but having a really LONG conversation turned into a working program ain't gonna happen - not yet, anyway.





         Table of Contents
         Introduction to "The Perfect Language"    —   Previous
         What's in Store?    —   Next
         Glossary




Started: August 14, 2010