by Cornelis Robat, editor
This chapter will be revised constantly.
How It All Started
Enter the Information Age
programming the new tools
new methods required
What is a program
Creation of a program
Interpreters and Compilers
Back in the dark days of history many clerics and scientists designed, or
tried to design, some kind of artificial intelligent being or whatever mimicked
it closely. Mainly to fool their followers, and in a way the purpose was fulfilled.
But the keyword here is: Artificial Intelligence. (AI) This is in short the
Holy Grail of all that has to do with computing. Even Leonardo da Vinci, who
probably created the first human alike automata, persued this "stone of
Asimov's favorite subject in his books were humanoids (Robots) too. Robots were remarkable popular in films and TV series even till this day (e.g. Mr. Data of the Startrek series). And who does not remember good old Frankenstein with his recreation of a human 'being'.
What drives the development in AI in present time is the demand that all new gadgets have to be smart, adaptive, responsive, intuitiv, and so on. It puts a serious strain on the developers to come up with a solution.
The secret to this all is AI, but for this to make reality new programming
languages were needed. And strange enough these languages were developed parallel
to the other languages, languages that could mimic intelligence.
One of the first and best known is LISP, developed in 1958/9 by the Artificial Intelligence Group at MIT under John McCarthy (McCarty also coined the term AI). Lisp is used in so called expert systems: you ask the program answers. But also in situations were lots of data have to be interpreted, like chess programs.
example of lisp:
;;; ================ ;;;
(DEFUN HELLO ()
Other examples of languages that are used in the AI field are: Prolog (PROgramming LOGic 1970, Alain Colmerauer)
example of prolog:
// the main program
Smalltalk (1979), Algol (1960), and in lesser extend Simula(1967),
This is about rewriting routines and programming functionality over and over again for each different part of a program and for each new program again and again. There was a need for common shared parts that acted on instructions like a black box. So after some years formal development of software was on its way, it was strongly felt that portability was one thing but reusability was another. And history repeats itself: what was written about the subroutines above is also true for this idea of black boxes. OOP was introduced by the development of Smalltalk (in 1979) and became known as the Object Oriented Programming method another significant step into the right direction.
There is a formal definition of OOP, but that goes beyond the scope of this paper.
To read more about this subject go to the next chapter: object oriented programming a primer
New Methods required
Programming became an 80 hours a week job. Debugging took as long as creating the software. Too much some must have thought. Lets do something about it and make some more business.
In the late 1980's the Graphical User Interfaces were created by the same manufacturers that made software like C, Delphi, Clipper VO, and other languages to expedite the creation of software. Though this kind of interface stemmed from as early as the 1960, the idea never took off until the early 1990's.
Having a graphical interface is marvelous, but there must also be a way to put it to use and thus interactively build software. The answer was RAD: Rapid Application Development.
A drag and drop interface, taking a lot of the handy work out of our hands. Like connecting to a database, getting and putting data to it, interfacing with the user or multiple users, share devices and resources, load balancing between multichannel processors, networks and so on. Point and click interfaces were created and with a few button clicks and entering forms with properties by the programmer a complete new program could be created.
Thus far the results are not very satisfactory, mainly due to project definitions and organizational problems, but improvements are on their way.
OOP was and is very much part of the RAD method.
And the future? Who dares to tell. Everything is changing so fast.
But some glimpses are there:
Objects containing Artificial Intelligence, self propagating objects, interfaces allowing anybody to build applications intuitively, and so forth.
Now let's go back to earth and explain the insides of programming a bit.
Any computer only works from a set of instructions in order to execute a specific task.
A set of instructions combined into a file or object is often referred to as a program or class.
The contents of such a file is also called: source code or logic.
Depending on the programming language and the syntax the source code is written in, the sources are optimized and translated. The product of this translation is what is often called machine code.
After the source code is compiled on a certain machine the resulting code can most certainly not be read by a normal person. Or better almost nobody.
This process we call the process of compiling: in other words the source's
code is translated into something that can be understood by a computer's CPU.
With most modern languages there is a single compiling run resulting into machine language. And the CPU does not have to interpret the pre-compiled or also called the opcode
the source code (readable) is translated into machine code that can be executed directly by a computer's processor.
Interpretation can be seen as if your own mother tongue would be translated into Esperanto. This language will not be understand by the computer but it knows how to translate Esperanto into its own language: machine code.
However translations can be on different levels. From a single shot, for the
CPU not a fully compatible compilate - also called opcode - into and through
various transitional stages of compilation.
From source code to what is "translated" into 100% "computer language".
This means: a self contained running program called an "executable".
This translation or compilation can either be done by software - as explained above - or "hardware".
An example of the first method: interpretation is that the computer reads the
opcode and compiles it while running the program. The early forms of BASIC were
such an example.
The latter method is mostly an EPROM mounted on a computer board containing instructions. (micro code)
As an example look how Soccam- a programming language for parallel processors- is implemented: it is integrated in a chip mounted on a board.
Again: what is a program?
Some explain it like this:
A program is a sequence of instructions that can be executed by a computer (CPU) that performs a certain task intentioned by the programmer
Within a program actions are defined that should be executed by a computer. These actions are put in a predefined order: a logical path. And, if done correctly, in such a way that a fast, optimal and efficient execution is assured.
Sounding more scientific, surely more words, but with the same massage
Creation of a program
In the beginning days of software development the situation was:
There was no client
Or the end user - analyst - designer - programmer was the client all in one. Being one's own client. There was no communication other then with coworkers, let alone that people outside your group understood what you where talking about, if you were allowed to do so anyway.
Traditionally programs are made in a series of production phases starting with Analysis:
This picture can also be seen as the life cycle of a program. As each step of production phase is a new cycle in itself each cycle has also its own end product.
Moscow means: Must have, Should have, Could have and Want to have (wish list)
Though testing is historically underestimated it is almost as important as creating the software. When not tested a lot of bugs (errors) will show up during the implementation phase and later production. The customer will not be amused.
More detail in another chapter: Stages of program development (under construction)
To make a program suitable for a particular CPU (processor) the source code, as explained above is compiled. When compiled the result of that process is called an executable. This executable can not run on another machine with a different type of CPU. (e.g. Motorola, Intel, RiscPa, Sun Sparc ... )
Because the instructions to add two numbers can be as different for different processors (the real computer) as Chinese and English
But for some programming languages a recompilation on a different machine with a different CPU (target) will do the job. Simple as that. And the new executable will be able to run on that particular types of machine. The computer language 'C' is an excellent example of this technique. The how and why will be explained on a separate page dealing with C and in context with other languages below.
This process to carry over source code and recompile it on different machines (CPU's) is called: "Porting", to bring from one platform to another.
There are many forms of compilers and interpreters, over a 3000 have been identified
so far. See the languages index
for a list of the most common ones.
For a user the outer differences between the various languages and the dialects or variants are sometimes vague to say the least
The development of languages is still very much in changing day by day. But
in the same time they become more and more accessible for people other than
hard core programmers.
In the late nineties visual interfaces are designed to take away the chores of setting up projects, linked lists, memory management, and the like.
In the beginning of the third millennium interfaces are made in such a way that you "almost" do not need to have knowledge of a programming language in order to be able to make a program.
Now and again scientists start saying that in a 15 year we would no longer
make use of programmers.
But than who is making that kind of programs program to make such programs that makes programming superfluous. The classic princess in the tower parable.
But nonetheless in the far future you may expect to only have to define your needs and interfaces to create a program.
In the meantime creating such interfaces will be beneficial to both manufacturers (more clients) and users (easiness of creating projects). And that on its turn stimulates again the creation of easier to understand computer languages or so called interfaces.
Interpreters and Compilers
Like said before there are two methods to "translate" source code into computer language.
The first one is called INTERPRETER and is also the oldest type.
The second kind is called COMPILER meaning so much as composer.
And, it would be human when there are no exceptions. There are some other methods some times combining interpreters and compilers. Many old hands still know GFA BASIC from their early hacking days. GFA BASIC was and I believe still is available for the Amiga, Atari and IBM or compatibles.
These languages make Esperanto from the source code that is translated by the CPU.
An interpreter is a computer language that execute instructions that are written in the form of a program.
The trick of an interpreter is that it loads the source code and translates
the instruction into executable machine language line by line. And it does it
over and over again any time that program is run.
In order to run a program you have to load the interpreter first (e.g. "load BASIC") then load the source code into the computer ("read spaceinv") and then type "run to execute the program.
Because of all this an interpreter is relatively slow.
There are interpreters that generate machine code before the program is executed in its entirety. That is such a hybrid language. The processing time is shorter and the program runs faster.
This method only has its use if the original source code is relatively large to compensate the lost time of translating source code into object code and then machine code. Because also this process is repeated every time the program is loaded.
Another disadvantage is that the user need to have a (Legal?) copy of that specific interpreter. That van be a set back for the user because mostly these interpreters are expensive. Also to the designer of the program this can be a disadvantage. He or she will just give away its trade secrets and have no longer control over his own program, the distribution or avoid unwanted adaptations.
In the end of the second millennium companies will start to give away the source
codes for the operating systems (SUN Solaris)
Or a operating system will be in the public domain from the beginning like LINUX. Or the operating system is just given away (BeOS)
In short a COMPILER is: the translator of the source code into computer language.
The object codes (modules) that are created from the source code files by a compiler are not yet suitable for a computer. Though that is depending on what kind of system you would use.
The object code contains information not only on the instructions given by the programmer but also instruction for the computer about memory allocation and references towards external locations and sub routines (libraries)
Object modules can not be used by a computer because of missing pieces and incorrect order of the modules -sort of dependency.
To get every thing straight a so called "LINKER" is used. The linker accepts one or more modules, solves external references and binds the various necessary routines from libraries into the program. And finally it reallocates memory blocks inside the program so that one piece will not overwrite another part of the program in memory.
Finally everything is neatly compiled and will be written to a disk or other form of permanent memory.
The result is an executable file or program.
The now ready made program can be run independently and be loaded and executed by the operating system
In the case of a DOS program, yes there are many still around, the end product will always be stored in EXE format ( see chapter on DOS) There is also a program called: EXE2BIN that can process relatively small exe's into COM executables. These are somewhat more compact and run, in some extend, faster.
A common question is:
Is there a standard version of any higher level programming language?
The answer is NO
An important problem with higher programming languages is that they are seldom
This means that a program written in FORTRAN or BASIC for a specific computer will not always run on another type of computer from a different brand. Even within a certain brand this could be a problem.
The cause of all this is because most manufacturers are creating their own standards in programming languages. The basis is often the same, but instructions for input and output and the extensions manufacturers think they need are most of the time different.
But there is hope. Manufacturers of programming languages are slowly coming to their senses in the beginning of the third millennium.
Still most of the sources have to be partly rewritten to make them suitable for other machines. This rewriting is often called: to "porting" (from 'portable' - to carry over).
This is where the language "C" becomes important. This language has a high portability because all in and output commands have the same syntax. The translation happens via the various codes that are incorporated in the so called libraries A library is a kind of repository of all possible program routines ranging from listening to the keyboard to managing memory that are specific for the computer where the program has to run on.
When you load the source of a C program on a different machine the only thing the compiler program has to do is: recompile - linking of I/O routines and other specific functions. But now with a different library written for that specific machine. In this way (almost) no code should be rewritten but only recompiled. This is one of the reasons C became a very important, but because of its rather technical inclination, not a very popular language.
One of the first instruction sets that could be seen as an easy to understand computer language was BASIC, an acronym (letter word) for: Beginners All purpose Symbolic Instruction Code. This language until the end of the second millennium was almost always bundled with a new computer.
In the 80-ies a new breed of of promising languages show up. These are the
so called 4GL's (Fourth Generation Languages). The 4GL's allows the user to
define his wishes or intentions in a certain form of expression. The 4GL program
or compiler generates the final compiled program for the user.
A programmer can add a few whistles and bells but has to keep that to a minimum in order not to endanger later updating of the program.
Environments like that are called CASE (Computer Aided Software Engineering) or Workbench environments
True: a program developed that way does not always have user friendliness in its list of top-priorities. But when managed well a project with these tools is finished mostly within time and budget. And that can not often be said of projects being build without such tools.
During the 1950's the first computers were programmed by changing the wires and set tens of dials and switches. One for every bit sometimes these settings could be stored on paper tapes that looked like a ticker tape from the telegraph - a punch tape - or punched card. With these tapes and or cards the machine was told what, how and when to do something.
To have a flawless program a programmer needed to have a very detailed knowledge of the computer where he or she worked on. A small mistake caused the computer to crash.
Because the first generation "languages" were regarded as very user
unfriendly people set out to look for something else, faster and easier to understand.
The result was the birth of the second generation languages (2GL) at the mid of the 1950's
These generation made use of symbols and are called assemblers.
An assembler is a program that translates symbolic instructions to processor instructions. (See above for an example) But deep in the 1950's there was still not a single processor but a whole assembly rack with umpteen tubes and or relays.
A programmer did no longer have to work with one's and zero's when using
an assembly language. He or she can use symbols instead. These symbols are called
mnemonics because of the mnemonic character these symbols had (STO = store).
Each mnemonic stands for one single machine instruction.
But an assembler still works on a very low level with the machine. For each processor a different assembler was written.
At the end of the 1950's the 'natural language' interpreters and compilers were made. But it took some time before the new languages were accepted by enterprises.
About the oldest 3GL is FORTRAN (Formula Translation) which was developed around 1953 by IBM. This is a language primarily intended for technical and scientific purposes. Standardization of FORTRAN started 10 years later, and a recommendation was finally published by the International Standardization Organization (ISO) in 1968.
FORTRAN 77 is now standardized
COBOL (= Common Business Oriented Language) was developed around 1959 and is like its name says primarily used, up till now, in the business world.
With a 3GL there was no longer a need to work in symbolics. Instead a programmer could use a programming language what resembled more to natural language. Be it a stripped version with some two or three hundred 'reserved' words. This is the period (1970's) were the now well known so called 'high level' languages like BASIC, PASCAL, ALGOL, FORTRAN, PL/I, and C have been born.
A 4GL is an aid witch the end user or programmer can use to build an application without using a third generation programming language. Therefore knowledge of a programming language is strictly spoken not needed.
The primary feature is that you do not indicate HOW a computer must perform a task but WHAT it must do. In other words the assignments can be given on a higher functional level.
A few instructions in a 4GL will do the same as hundreds of instructions in a lower generation language like COBOL or BASIC. Applications of 4GL's are concentrating on the daily performed tasks such like screen forms, requests for data, change data, and making hard copies. In most of these cases one deals with Data Base Management Systems (DBMS).
The main advantage of this kind of languages is that a trained user can create an application in a much shorter time for development and debugging than would be possible with older generation programming language. Also a customer can be involved earlier in the project and can actively take part in the development of a system, by means of simulation runs, long before the application is actually finished.
Today the disadvantage of a 4GL lays more in the technological capacities of hardware. Since programs written in a 4GL are quite a bit larger they are needing more disk space and demanding a larger part of the computer's memory capacity than 3GL's. But hardware of technologically high standard is made more available every day, not necessarily cheaper, so in the long run restrictions will disappear.
Considering the arguments one can say that the costs saved in development could now be invested in hardware of higher performance and stimulate the development of the 4GL's.
In the 1990's the expectations of a 4GL language are too high. And the use of it only will be picked up by Oracle and SUN that have enough power to pull it through. However in most cases the 4GL environment is often misused as a documentation tool and a version control implement. In very few cases the use of such programs are increasing productivity. In most cases they only are used to lay the basis for information systems. And programmers use all kinds of libraries and toolkits to give the product its final form.
This term is often misused by software companies that build programming environments. Till today one can only see vague contours. When one sees a nice graphical interface it is tempting to call that a fifth generation. But alas changing the makeup does not make a butterfly into an eagle.
Yes some impressions are communicated from professional circles that are making these environments and sound promising.
But again the Fifth generation only exist in the brains of those trying to design this generation, YET!
Many attempts are made but are stranding on the limitations of hardware, and strangely enough on the views and insight of the use of natural language. We need a different speak for this!
But it is a direction that will be taken by these languages: no longer prohibiting for the use of natural language and intuitive approach towards the program (language) to be developed
The basis of this is laid in the 1990's by using sound, moving images and agents - a kind of advanced macro's of the 1980's.
And it is only natural that neural networks will play an important role.
Software for the end user will be (may be) based on principles of knowbot-agents. An autonomous self changing piece of software that creates new agents based on the interaction of the end user and interface. A living piece of software, as you may say. And were human alike DNA / RNA (intelligent?) algorithms can play a big role.
Which high-level languages are in popular use today?
There are more than 200 other high-level languages such as PASCAL, FORTH, PL/I, LISP, SMALLTALK, APL, C/C++ and PROLOG to name but a few. Many of these were developed for particular applications, while others are developments or improvements of existing languages. As an example, COBOL is a widely used language for business applications and has an almost English-language type of structure, whereas APL is a mathematically oriented language that uses symbols and equations rather than written words.
One of the first languages for general usage to be developed was BASIC, an acronym (letter word) for: Beginners All-purpose Symbolic Instruction Code. It is this language which is so widely spread that there is, until the end of the 90's, almost no machine sold without some kind of dialect of BASIC. Be it as a stand alone installation or incorporated into an office application like Word or Excel.
For scientific applications, though BASIC is often used, the language is FORTRAN (Formula TRANslation), which was initially developed by IBM for their large computers. Another similar language called ALGOL (ALGorithm-Oriented Language) has also been used for scientific work but has not achieved the popularity of FORTRAN. Both FORTRAN and ALGOL use statements similar in form to those of BASIC, although their syntax rules are usually rather more complex and programs written in these languages tends to be a little more difficult than BASIC.
The ALGOL language was favored in academic circles because it tends to be more flexible than FORTRAN and more readily allows the construction of 'structured' programs. A development of the ALGOL idea is the language called Pascal.
Pascal was developed by Niclaus WIRTH (published in 1973) and is gaining on BASIC in importance mainly on educational institutions. Pascal was standardized by ISO in 1983.
An improvement on PASCAL is MODULA by the same author and stresses much more the structural architecture of programming.
Languages such as FORTRAN and BASIC are primarily designed for performing calculations and evaluating equations. They are generally less effective when dealing with text or the relationships between sets of data. A number of high-level languages, such as LISP, have been developed to deal with these applications. LISP (LISt Processing language) itself has become very popular for work in the study of Artificial Intelligence (AI), where the emphasis tends to be on data relationships rather than formula evaluation.
LOGO is a simplified variation of the LISP language and has recently become quite popular in education. One interesting part of this language is its 'turtle graphics' in which instructions are given to a robot (turtle) carrying a pen which may be moved around a sheet of paper or screen to draw pictures.
The attraction of FORTH is that it can easily be extended at one's need. This language is highly standardized thus there are almost no different dialects in circulation.
'C' was originally designed for and implemented on the UNIX operating system on the DEC PDP-11, by Denis Ritchie. The great advantage of C lays in the fact that C is not tied to any particular hardware or system and it is relatively easy to write programs that will run without change on any machine that supports C. C is now one of the most important languages in use on the 680xx CPU based machines.
Java, developed by engineers from Sun was originally developed for embedded
computing. And is largely based on C++.
Sun decided to give it to the world as an open source language. Sun is very strict in maintaining compatibility and standardization though.
But since its publication Java has take off like a rocket in popularity and implementation base. Almost all web applications are build in Java. Front end applications in banking, insurance, and other large companies based on centralized dataprocessing but distributed access all have their interfaces build in Java. Or are in the process of doing so.
Most influential Computer Languages(4)
Konrad Zuse in Nazi Germany may have developed the first real computer programming language, Plankalkul" ca. 1945 (5).
According to Sammet, over 200 programming languages were developed between 1952 and 1972, but she considered only about 13 of them to be significant.
1980 dBASE II
1984 Standard ML
For the big list of computer languages see lang-list.txt maintained outside tHoCF.
This concludes the introduction to the history of software. See also the languages index.
For further reading: Impressions and thoughts on creating software, an essay
|Last Updated on November 22, 2005||For suggestions please mail the editors|
Footnotes & References
|3||Dennis M. Ritchie Bell Labs/Lucent Technologies, Murray Hill, NJ 07974 USA|
|5||This is mentioned in the 1978 ACM History of Programming Languages FORTRAN session.|