When a software project grows even a little bit beyond "small," managing its complexity starts becoming one of the biggest factors affecting its ultimate success - maybe the biggest factor.
I'm not talking about commercial success (although that too will be impacted by complexity issues), I'm talking about at what point do you start pulling your hair out because it's getting so hard to stay on top of what you're trying to create (or later, what you're trying to maintain).
Early on, I realized that a large proportion of programming is what I called "bookkeeping." Commenting, version tracking, archiving, organizing the testing, deciding what functions to group together in what (hopefully) modular file ... pretty soon it begins to feel like for every creative line of code you write, you're performing half a dozen scut jobs. The ratio may not be quite that bad, but it sometimes seems like it.
I've come to the conclusion that, in a perfect world, every important item in source code: every valuable variable, every function name/call, even every conditional - if, case, and/or, etc. - should be hyperlinked to a complete description of its usage and effects. Of course, this would require a specialized editor. Using your favorite text editor for your programming would no longer be an option. I'd probably complain about being locked into this kind of dedicated editor myself.
Such an editor would also not greatly reduce the scut work - you'd still have to write the descriptions and keep them updated. Yet it would solve the debate about whether program documentation should reside within the program text, or outside in a formal or informal manual. You can find discussions and outright flame wars on this subject in online forums and elsewhere. I come down on the side of documenting within the source code, because human nature is such that the outside documents are never kept in sync. People simply don't bother to maintain them properly.
Heck, even in-source commenting tends to get crufty over time. Let's face it: documenting code is less creative than writing it in the first place. It is not as much fun, so programmers tend to let it slide. Yes, I'm guilty of this too.
If this all sounds like overkill, fair enough. In your day to day programming, writing functions that will change, perhaps radically, as the overall system develops, then you're right. All this commenting will just slow you down. But for code that will have some persistence, and especially if it is meant for others to use, like a library, complete and accessable documentation is a must.
As we'll see below, My task is to create THE library. That is, to come up with the basic primitive operations of a whole new programming language. A new language is one of the toughest sells there is. Without fantastically great documentation, it will be hopeless for it to find acceptance.
Well, it is not a perfect world, and so far as I know the magic editor doesn't exist, and unfortunately complexity is more than merely documentation anyway.
In 2010 I got involved in a discussion on the LinkedIn Group named System & Product Design Engineering. The simple question: "Why are programming languages still built on line-by-line instructions? Are you aware of other, more viable options?" ended up generating 120 wide-ranging comments about all the various development tools that try to take us beyond the line-by-line limitation of software development. Here follows my final post:
I'm with Jon - Just when I think I'm not going to follow this discussion anymore, it draws me back. The mafia strikes again. What brought me back this time? The recent discussions between Dirk and Emilijan about refactoring bloatware and sustainability, coupled with my having just read an online article by Joel Spolsky entitled "The Law of Leaky Abstractions."
And yes, it is on topic, because the law, which states, "All non-trivial abstractions, to some degree, are leaky." requires the programmer to sometimes have to go back and dig through the underlying line by line code to figure out why an abstraction is broken.
And what's an 'abstraction?' All the things we've been talking about: The GUI mouse click, or the OOP inheritance paradigm, or the etcetera, that lets you "write" many lines of code with a simple mouse click, or instantiation, or etcetera - except when it doesn't.
The 'leak' is when the fancy code-generation tool, of whatever type, doesn't generate the expected, working code. That's when you'd better really know the language underlying the tool, because you're going to have to go digging through the headers, files, and ultimately, lines, in order to figure out where their assumptions are different from yours. And then come up with a work-around.
One quote from the article: "And all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder."
This paragraph, indeed the whole article, struck a deep chord in me. It summed up feelings that have been growing in my subconscious for years. I'm TIRED of sitting in front of a new app, tool, OS, whatever, and being uncertain from the very start about what's hidden from me that can bite me in the butt later.
I'm the kind of guy who wants to know what's under the hood. And I mean I want to know how the iron was mined, smelted and cast. How the oil was drilled, refined, and turned into plastic. This is a completely impractical attitude to carry into today's software engineering world. I'm not complaining, it is what it is.
Way back in the late 1980's, one of the unix gurus at my company had to dig into the underlying code of the new Sun workstations for some reason. Once, after following a function call chain back twenty two levels, he stated, "This system was created by well-intentioned barbarians." Software by committee. And that was two decades ago. Spolsky's article was written back in 2002. It is what it is.
Now that I'm semi-retired, I'm developing a Forth-like language for my own unmarketable computer architecture. Little doubt that the resulting language will be equally unmarketable. But I'm having real fun in front of a keyboard for the first time in years. In fact, I'm having a blast. I know what's under the hood and I know why I'm doing what I'm doing. Even my stupid mistakes are enjoyable.
And I feel a bit sorry for a graduate just beginning to earn a living creating commercial software these days. God bless, and more power to you - you'll need it.
In a followup to the above named article, Spolsky gives us this quote in "Lord Palmerston on Programming:" "Leaky abstractions mean that we live with a hockey stick learning curve: you can learn 90% of what you use day by day with a week of learning. But the other 10% might take you a couple of years catching up ... If you're building a team, it's OK to have a lot of less experienced programmers cranking out big blocks of code using the abstract tools, but the team is not going to work if you don't have some really experienced members to do the really hard stuff."
So why is it so hard? Most of the basic building blocks of both hardware and software are relatively simple. My non-computer-literate friends roll their eyes when I claim this, but it's true! Then what's the problem? It is that there are so many of these building blocks. So very many, and they are layered on each other into a thick cake.
There is virtually no way to have a modern, full featured computer system built up from the necessary hardware and software blocks without it being surprisingly complex. Surprising to me anyway. When I first started designing the Flexible System Architecture, my motto was "Keep it simple and straightforward." I'm not sure I can claim that as my motto anymore. I've tried, but there's just so much to implement. Hey, "It is what it is," right?
The question becomes: "If there's gotta be complexity, is there a best 'place' to put it?" I know where I'm going to try to put it: at the very bottom. Keep in mind that the FSA has a very rich basic instruction set. And many of these instructions are unusual - they are going to be unfamiliar to even an experienced assembly language programmer. Add to this the fact that I'm trying to tame this unruly collection with a Forth like language - a language that tends to be made up of lots of tiny, self-actuating pieces, and you have yourself a whole slew of what are known as "primitives."
At this point, the uncharitable reader may ask, "With all these primitives, where else is he going to put the complexity?" A valid point, but ... are there advantages to sinking the complicated stuff down to the lowest level?
The average computer user has no idea how many levels of complicated stuff resides beneath whatever application he is working on in whatever window is open. As an example, let's look at me, running my simulator (since I know it so well). I will describe it as it would be operating in my newest computer, which as I write this happens to be in the shop, due to complexity issues.
The new system consists of an Intel CPU, which runs the programs, sitting on an Intel motherboard, which handles everything else that the CPU doesn't, like talking to the peripherals. The motherboard is very general purpose, but I wanted faster graphics so there's an NVIDIA graphics card plugged in. That's the hardware.
Now, to get to sim.exe, the simulator program, I have to write a lot of C code and then compile it with a Borland compiler. Borland also links in two modules from the Simple DirectMedia Layer (SDL) to let me have a nice, fast color display. I then run Sim under Windows XP, which lets it read in data from files like the nine lines of machine code that add the integers 3 and 5, introduced way back in English versus the Machine.
Now, Windows has an unbelievable number of layers of software that I won't even attempt to describe here, which lets Sim also gain access to the keyboard, mouse, & display. But it gets better. On my new system I run Windows within something called Virtual Box that runs under Debian Linux which runs under Ubuntu 10.10, a top level user interface for Debian. All to display a few bouncing bits on a screen.
I had a minor but annoying problem connecting to the Internet from the new system which I tried to solve by upgrading to Ubuntu 11.04. And everything pretty much collapsed. Hence the visit back to the computer store. Now, I like Ubuntu, but I'm afraid that version 11.04 isn't quite ready for prime time. The guy at the store agrees, because after trying to make everything work right again with 11.04 he went back to 10.10. I should get my system back tomorrow.
Too much geek information? Sorry, I'm just trying to illustrate that maybe there are too many levels at work here. So I should perhaps drop the whole Linux/VBox layers? Not that bad an idea in the short term, but you see, I want to port all my development to Linux ultimately. Why? Because Linux is a thinner layer cake than any version of Windows. And Linux has more user access to underlying levels. Indeed, much more access. So I can look under the hood and actually see nuts and bolts and springs rather than a smooth layer of sheet metal, which is pretty much what Windows gives you, unless you want to become a Windows guru, which I don't have time for.
Sim.exe is a relatively uncomplicated Windows app, compared to a typical commercial program written in C++ or Visual Basic or whatever, and calling on Windows APIs and the .NET system and who knows what all else. A very thick cake indeed.
In my research for this series of essays I ran across Wirth's Law (now apparently updated to Page's Law): "Software is getting slower more rapidly than hardware becomes faster." This is happening because most programs today are not written line by line, but rather chunk by chunk, object by object, library inclusion by etcetera. The result is layer upon bloated layer, with too many of the knotty intricacies hidden in the middle. I truly believe that most of the complexity in software should either be near the very top or the very bottom, where it is easier to get to.
We already know my complexity is going to be at the bottom. It is going to exist, so like the subtitle above said, it has to go somewhere. Taming this complexity is a bit different than what the magic editor described at the beginning would provide. That would be for a user defined program with user created variables, functions, etc.
What is needed to handle all the low level machine language / Forth primitives is a big, rich, searchable database. Whether it ends up being a fairly standard relational DB (imagine: the first computer language with SQL as a standard utility) or something more exotic is something I have to think about, because to be useful it must also contain explanations, tutorials, and examples. For that, it would seem that a format supporting hyperlinks would fill the bill better than a vanilla DB. We'll see when the time comes.
When I first started designing the FSA, I let the 'F' stand for 'Flat' as often as it did for 'Flexible." What that means is that someone programming an FSA based chip could see and affect all the underlying control bits for the computing circuitry. For example, many of the bits in the screen shot on the GenAPro home page are control bits such as which counter to use, and whether that counter counts up or down, stuff like that. And if I've made any point about language design in these essays it's that my idea of a perfect language is one that's similarly 'flat.'
I certainly won't deny that this combined HW/SW flatness has dangers. In the hands of a bad programmer, powerful languages can go powerfully haywire. Put one on top of powerful silicon and things can potentially get seriously mucked up. But as I asked in the previous blog, if the language isn't part of the solution, is it part of the problem? The same might be asked about the underlying hardware.
Since these essays are primarily about language rather than circuitry, let's return to the more recent question of whether there are advantages to sinking the complicated SW stuff down to the lowest level. I think the answer is yes. I believe, or at least hope, that the problem of opaque, leaky abstractions can be minimized when software is founded on a large, well ordered, well documented layer of primitives.
Why? Because there should ultimately be fewer abstraction layers. We've learned that companies tend to run better if there aren't too many levels of management, so can the same be said about programs? I hope to find out.
I sometimes call these writings a "blog," but by some definitions it isn't really, because there's very little of its production that's in real time. I thought I'd add a note here at the 2011 Summer Solstice to explain why this is the first "entry" since last Fall.
For one thing, I decided to learn a lot more about Lisp and its sibling Scheme by auditing the online MIT course: Structure and Interpretation of Computer Programs. By about halfway through, I admit I was in over my head (probably because I wasn't doing the exercises). Still, I was definitely exposed to some deep and subtle thinking about computer languages. And now that I got my fast computer back today (finally!). I do indeed want to implement some of the simpler functional language paradigms from the first sections of the MIT class.
The other thing that has kept me busy is that I have moved my work on the FSA architecture back to a front burner. I have been totally overhauling the one quarter of the instruction set that deals with tests and responses. It is definitely the most complex work I have yet done on the design - I'm glad I didn't try dealing with it first!
So I am now dealing with feedback loops. A subsection sends some test bits to another, essentially saying, "Here's this info, tell me what you want me to do next," and the program running in the second subsection (which is an FSA standard sequencer) has to generate a quick, efficient reply. How quick? It depends partly on whether the sequencer allows itself to be 'interrupted' ("Stop what you're doing and answer me right now!"), or whether the sequencer can get around to responding in its own good time, known as 'polling'. The number of control bits for the standard seq is growing ...
Another issue, which I planned for but ignored until now, is handling the traffic on the communication channels between and among subsections. It's known as 'bus contention'. Who gets the channel and how long do they keep it? And if two entities are cross linked, each waiting for information from the other while both are trying to hold the bus open, you can get 'lock out' and the whole system can hang with the equivalent of a traffic jam.
One way to help deal with this problem is to have multiple buses, and also to have them localized in various ways. Of course, this leads to even more control bits to deal with and simulate. Oh well, "It is what it is."