SECTION 1 - Computing Today Historical Thoughtsfazliyildirim.com/pdf/note1.pdfSECTION 1 - Computing Today Historical Thoughts While the activity of counting objects and remembering

1

SECTION 1 - Computing Today

Historical Thoughts

While the activity of counting objects and remembering things extends back to the earliest times of humans, the idea of a mechanical device that could aid in the counting process, or that could actually do the counting, is relatively recent. There are individual examples of record-keeping devices in human history, but these are few: the Quippu of the Incas and the Abacus of ancient China. The Greeks, Romans, Persians, and many other antique cultures used variations of writing for the keeping of records or the remembering of partial answers in computations - for instance, wax tablet and stylus, clay tablets, papyrus and ink, and, later, paper. But the idea of a machine that could actually do the work of computing, rather than simply aiding the human in doing the thinking, dates only to the late 1700's.

The computer as we know it today is the product of several major eras of human's technology. Technology is the application of tools and techniques such as to improve the likelihood of human survival. In addition to the survival aspect, the use of tools and techniques to solve non-essential, but still needed or interesting problems, has given rise to many great inventions. These include things like the automobile, bicycle, radio, etc. The evolution of the computer lists these phases of development:

1. The Mechanical era, in which the "industrial revolution" provided the mechanical techniques and devices needed to build machines of any sort;

2. The Electronic era, in which the use of electrical devices and techniques obsolesced mechanical methods, and the

3. Semiconductor era, in which the relatively new science of semiconductor physics and chemistry extended the original ideas of the Electronic era to new heights of performance.

A few events will illustrate these time frames:

• 1780: Blaise Pascal designs and builds a decimal counting device that proves that mechanical counting can be done.

• 1790's: Jacquard devises a weaving loom that uses a chain of punched cards to define the functions of the shuttles through the warp, thereby defining the color pattern and texture of the cloth.

• 1822: Charles Babbage presents his Difference Engine to the Royal Society, London, where he demonstrates the mechanical computation of Logarithms.

• 1833: Charles Babbage presents his Analytic Engine to the Royal Society; it is never completed.

• 1830's: Ada Augusta, daughter of George, Lord Byron (English poet) works with Babbage to develop a schema for laying out the logical steps needed to solve a mathematical problem, and becomes the first "programmer". She also invests most of her husband's money in a scheme to beat the horse races built on Babbage's work - they loose their shirt.

• 1860's: "The Millionaire", a machine that could multiply by repetitive addition is announced. • 1888: In response to an invitation from the US Census Bureau, Herman Hollerith presents

his tabulating machines including a card punch, card reader, tabulator (electric adding machine), and card sorter. He wins the contract for equipment for the 1890 census.

• 1900's: Hollerith's machines are a success, and he sells them to countries for census, and to companies for accounting. His patents are stolen by competitors. He joins with two other

2

companies to form the Computing-Time Clock-Recording (CTR) company (tabulating machines, time clocks, and meat scales).

• 1914: The CTR company hires Thomas J. Watson, Sr., as president. His job is to beat the competition and put the outfit on the map. He immediately starts training for sales persons on company time, and renames the company to International Business Machines Corporation (IBM).

• 1920's: IBM and several others manufacture ever-more complex electro-mechanical tabulating equipment. The stock market crash of September, 1929, puts millions out of work and many companies fold. IBM reduces its activities, but never lays anybody off. They hire more salesmen.

• 1934: President Franklin Roosevelt institutes the Social Security Act, which requires that everybody in the country have a number. Great quantities of tabulating equipment are purchased to support this effort.

• 1939: Vannever Bush demonstrates the Differential Analyzer, the last great purely mechanical calculator.

• 1941: The United States enters World War II. Most companies refit for the manufacture of munitions.

• The War Years: Many new advances are made in electronics that will have effect on the tabulating business after the war, including radio, radar, sonar, and television.

• 1944: J. Presper Eckert and John Mauchley are given a contract to develop a purely electronic calculator for the calculation of bomb trajectories. The build the Electronic Numerical Integrator and Calculator (ENIAC) at the Moore School of Engineering, University of Pennsylvania.

• John Von Nuemann produces the Electronic Differential Storage And Computer (EDSAC) for the Institute of Advanced Studies, the first machine to use the stored program concept.

• 1949: The transistor is invented by Schockley et al. • 1951: A company started by Mauchley and Eckert to build electronic computers goes broke

and is bought by Sperry Rand. With this help, the two deliver the first Universal Calculator (UNIVAC) to the US Census Bureau, the first computer sold for commercial, non-military purposes.

• 1955: IBM introduces the 704 series computers, the first large-scale systems using transistors. • 1958: IBM introduces the 1401 and related systems, bringing card-based data processing to

the average company. • 1964: IBM bets the company on the introduction of the System/360, using microtransistors

and mass-produced core storage devices, and the idea of the "non-dedicated", microprogammed system. The product line is upward-compatible - a huge success that ultimately defines the mainframe market.

• Late 1960's: Several companies begin to develop and deliver true integrated circuits. • 1973: The Intel Corporation delivers the first integrated circuit capable of executing a fully

usable program, the Intel 8080. The microprocessor is born. • 1977: The Apple Computer Company is started by two college dropouts in their garage,

Steve Jobs and Steve Wozniak. Originally sold in kit form, the machine uses inexpensive parts and the home color television to bring computing to the masses. The BASIC programming language used in the machine is written by Bill Gates of Microsoft.

• 1981: IBM introduces the IBM Personal Computer, and coins a term the will live forever. At first aimed at the home market, the PC is immediately adopted by businesses large and small. Since the design of the system is published, many begin to write programs for the machine and to steal it's design. The use of the Intel 8088 processor ensures Intel's survival. Microsoft provides the Disk Operating System (DOS).

3

• 1987: IBM introduces the Personal System/2 (PS/2) product line including the Intel 80386 processor, the Microchannel, and the OS/2 operating system.

• Late 1980's: The use of the Windows operating shell produced by Microsoft provides a Graphical User Interface (GUI) for users.

• 1990's: Intel introduces the 80486, the Pentium, and the Pentium Pro processors. Speeds approach 200 megahertz. Advances in memory semiconductors permit millions of characters of storage to be available for a small price.

• Further advances in microprocessor technology bring us the Pentium 2, 3, and 4 processors, the AMD Athlon, the PowerPC, and clock speeds passing 3 gigahertz. Embedded processors are used in everything from automobiles to refrigerators, space craft, and graphics accelerators for display.

Introductory Terms

A primary purpose of an introductory course in computers is to provide definitions of the terms commonly used in the business, and to ensure that the student understands what the terms really mean. We shall define a set of terms as an introduction, and provide narrative where suitable.

Information in any business or science is the core of the concern's operation. With good information, a business can keep track of its accounting, improve sales by knowing its competition, ensure that its employees are paid, and carry on the business at hand. If information is lacking or inaccurate, a concern can suffer and may not survive. Computers are excellent tools for remembering, processing, and manipulating such information, hence they have become indispensable in business and industry.

Information in the computer business is usually called data, a Latin word meaning information. Actually, data are the plural, indicating many lumps of information, while datum is the singular form, indicating one lump. Data can take two physical forms in computers. The first is raw data such as time cards, order sheets, billing statements, invoices, bills of lading, etc. The important information is present in these forms, but the computer can't use it right away. The data must be converted to computer-usable form, which the machine can then take in and use without further delay. A conversion must take place between the raw form and the computer-usable form in order for the machine to have access to it.

This conversion traditionally took a good deal of time and expense. The classic example is a room full of hundreds of keypunch machines in which tabulating cards are punched with hole patterns that specify particular items of data. Later technology has allowed the conversion process to be automated or eliminated altogether, as in the use of a credit card in an Automated Teller Machine. In this case, the card is already in computer-usable form by virtue of the magnetic stripe on the back which is encoded with the essential information of the card holder.

The computer-usable data are entered into the system by an input device, such as a card reader, light pen, or credit card reader. The word input has multiple meanings in the business and here are two of them. The input device is the mechanical and/or electrical means by which the data enters the circuits of the system from the outside world. The computer-usable form of the information thus entered is the input data. Many words in the business have multiple meanings.

The data thus entered are processed or converted in some way to a form more usable to the operators. For instance, a time card may form the input data, and from the information stored on it can be determined the number of hours worked, the employee's name, etc. This data, stored with other information such as rate of pay and tax information, can then be returned to the human world in a

4

more useful form, such as paycheck for the worker and a W2 for the IRS. This returned information is called output data, and the printer on which the check is made is an output device. Hence, the word output also can be defined in more than one way.

In between the capture of the input data and the resulting output data is the processing. This is the conversion of the one form of data to the other. The processing is done by the Central Processing Unit (CPU), which is the "magic box" that does all the work and which has held so much intrigue for the uninitiated over the years. If a CPU is built around a microprocessor device such as the Intel Pentium, then it may be referred to as the Microprocessor Unit, MPU.

The computer system consists of two major parts, the hardware and the software. The hardware is the actual mechanical and electronic device that does the work of deriving results data from given data. The hardware is designed according to the theory of computing and according to what can be done with the semiconductor devices of which the CPU is made. However, it cannot in itself do the whole thing, It must be instructed how to use it's circuitry to arrive at desired results that are defined by the human beings that are working with it. The humans create this plan of processing, or program, using software, that is, the programming language and its syntax. So the software runs on the hardware, and the combination follows the plans of the humans to process input data to generate output data.

Data Storage

Data are stored in the computer and its related parts using the Binary numbering system. Whereas the Decimal system uses ten different symbols (0, 1, 2, . . , 9) in computation, the Binary system uses only two, 0 and 1. This is because the physical world is inherently a two-state system. A switch can be open or closed, the light can be on or off, etc. Building circuits that can exist in two different states is easy, and so we use the Binary system because it fits nicely into this plan.

The smallest unit of data is the bit, short for binary digit. Bits can be either a 0 or a 1. If taken in groups of eight, they become a byte. The byte unit is very common as a means of remembering simple lumps of data because it can be easily handled by simple circuitry and can represent a variety of different things in various codes (as we shall see). Two bytes together are called a word in the type of computers that we will be working with initially. The location of bits within a byte or word are referred to as bit positions. They look like this.

In a byte:

7 6 5 4 3 2 1 0

In a word:

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

It is common to use large numbers to represent the number of bytes involved in memory or disk storage. A kilobyte is 1,000 bytes, a megabyte is 1,000,000 (one million) bytes, and a gigabyte is 1,000,000,000 (one billion) bytes. Actually, these values are not quite correct, but will do for the moment. There are variations on these terms as you will see later in the course.

5

Similarly, we have terminology for increments of time. Since the CPU is operating at such a high speed, it is common to refer to very small increments of time needed for single operations. If a second is the standard increment of time, then a millisecond is 1/1,000 of a second, a microsecond is 1/1,000,000 of a second, and a nanosecond is 1/1,000,000,000 of a second. These amounts are incomprehensibly short for the human being whose average cycle time is about 1/18th of a second. However, computer circuitry regularly operates in these ranges. (Example: If an electron travels in free space at the rate of 186,256 miles per second, that is, at 300, 000,000 meters per second, how far will it travel in one nanosecond?)

Storage and Memory

In addition to the entry and exit of data into and out of a CPU, the CPU contains many data paths, logical units, and functional parts. These will be discussed later. However, one of these parts has become so important in it's own right that it should be presented from the beginning in the introduction of computer theory. Originally part of the CPU and discussed like any other, the part of the computer that stores data for processing and the results thereafter is both an essential part of the design and can also be one of the technical problems. This is referred to as memory or storage.

There are two types of memory mechanism in typical modern computers. The first is the main memory, or that circuitry which is directly accessible automatically and at high speed by the rest of the circuitry of the processor. In the early days, this device consisted of cores or doughnut-shaped pieces of a magnetic ceramic material that were string like beads on a grid of wire. By passing current through the wires, binary 1's and 0's could be stored and retrieved from the cores. These worked well but due to the laws of physics, there was a certain upward limit of performance that could be achieved. When semiconductor technology advanced to the point of making integrated circuits practical, semiconductor memory devices were a major produce. You may be familiar with these as SIPs or SIMMs or DIPs used in personal computers on their mother boards. The term memory now generally refers to these semiconductor devices.

The term storage originally referred to the magnetic core system discussed above. However, the word is now used primarily to discuss external data-holding mechanisms such as disk and tape drives. In the old days, disk drives and tape drives were referred to as bulk or auxiliary storage. We tend to think today of either floppy diskettes or Winchester-style small fixed disks or hard disks used in personal computers as storage. These devices have revolutionized the way data are handled in many computer systems, and a system can find itself dependent on its drives as the main definition of the computer overall performance.

So, generally, the term memory refers to solid-state, printed-circuit board things, and storage to disk drives and similar devices.

Some terms involved with memory include:

• Read-Write Memory (RWM), typically your motherboard main memory, into which a computer can place data and from which that data can be later retrieved.

• Read-Only Memory (ROM), typically also found on the motherboard, but which can only be read from, and not written to.

• Random Access Memory (RAM), which is a term misused by those who really should say RWM most of the time.

6

The term RAM really means that a device supports the ability of the computer to access data in a random order, that is to store or retrieve bytes in a non-sequential way. Both RWM and ROM are RAM in nature. However, the acronym RWM is hard to pronounce, so RAM became the norm.

Certain types of memory and storage devices can remember what they contain with or without the power being applied to the system. Such storage devices are called non-volatile. Magnetic disk drives and cores are good examples. Other types of memory need the power to remain stabile or they will forget what they contain. The memory chips on a motherboard are a good example of this type, called volatile.

More on Hardware

In the good old days, you could tell whether a computer was a mainframe or a minicomputer by looking at it and measuring the floor space the cabinets took up. Today, with so much computing horsepower contained in such small devices, physical size is no longer a criteria. Today we measure computer size in throughput, that is, how many instructions can the system do in a given amount of time. Depending upon the design of the system, we have computers which work in Millions of Instructions per Second, called MIPS. Some computers that are designed primarily with scientific processing in mind do many Floating Point Operations per Second, referred to as FLOPS. A floating point operation is one where the decimal point in a decimal fraction is taken into account and is included in the design of the numbers used by the computer. More on this later.

The original idea of a minicomputer was that it was smaller, slower, and cheaper than a mainframe, which traditionally cost a great deal and required a lot of space and people to work it. The minicomputer was almost "personal" in its design. This definition persisted until the advent of the microprocessor, at which time we had the microcomputer, which contained a microprocessor as its primary computing element. As the microprocessor device progressed in capability, the minicomputer became obsolete and the size of the traditional mainframe began to shrink. Microprocessors are now to the point that they can do what minicomputers and small mainframes did just a few years ago. Accordingly we have table-top or table-side systems, floor standing systems, and laptop and palm-sized computers. We measure these by throughput and performance, regardless of physical size.

The term supercomputer is used to identify large mainframe systems that are designed for particular types of scientific calculations. These systems are designed to work with numbers at great speed, to prepare everything from weather maps from satellites to keeping track of aircraft in the sky. They will remain as a specialty item for that type of computing.

Just as the design and inner workings of the CPU have evolved with technology, so have the input/output devices evolved. In the beginning, the two primary forms of I/O were the reading of tabulating cards into which were punched holes in patterns or codes that represented numbers and letters. Although several such coding systems were devised, the Hollerith card code with 80 columns and 12 rows of holes became the standard. The cards were read, the data processed, and the result was either more punched cards or a simple printout of numbers and accounting data. The system was limited with how fast the cards could be moved, and some early tab systems had no storage at all.

Currently we have a variety of I/O devices that have taken advantage of technology. While in the old days, the conversion process from human-usable form (orders, waybills, etc.) to computer-usable form (punched cards) was an essential step in the process. Now, many items used by humans

7

everyday are also computer-usable, such as credit cards, touch panels and screens, scanning laser badge and tag readers, etc. Every school kid is familiar with a mouse and keyboard it seems, as these are easy to use if the software is provided,

Output today falls into two main categories, softcopy and hardcopy. Softcopy is what you see on the screen. It is soft because you can't take it with you except in your mind and memory. The Cathode Ray Tube (CRT) display of the video monitor of the typical personal computer is a prime example. Although the CRT is being replaced with Liquid Crystal and Plasma displays, the venerable video monitor is still the standard for video output. Hardcopy is a term used to indicate a piece of paper on which something is printed. This paper can contain the resulting data in the form of numbers and letters, pictures, and various other images, both in color and monochrome. The method of placing the image onto the paper has evolved also. Originally, the impression of a piece of type that pinches an inked ribbon against paper was the common method, and is called impact printing. The process is similar to a typewriter. We now can generate images using heat as in thermal printers, light as in laser printers, and with improved methods of the traditional printing means, as in dot matrix and inkjet printers.

Disk drives fall into the category of I/O as well as that of storage. Two types are currently in use in common systems, the "floppy" or diskette, and the fixed or hard or Winchester disk drive. The floppy is an IBM invention, and originally was released in an 8-inch diameter. This large diskette could store 256,000 bytes of information on one side of the diskette. We now have diskettes in the 5.25-inch size (although this size is fast fading from view), and the 3.5-inch size, whose capacity is increasing with technology. The floppy is designed for portability, backup, and small storage and is supported almost universally as a simple means of data exchange and retrieval

The fixed or hard disk is also an IBM invention, and although many makers produce the devices, IBM holds the most design patents and has done the most to improve the capacity and reduce the size. The size of the device has been reduced from 28" to 14" to 8" to 5.25" to 3.5" to 1.8" in diameter, the speeds have increased from 1500 rpm to 7500 rpm, and the chemistry used to store the data as magnetic lines of force on the surface of the disk has undergone radical changes. Until such time as a solid-state device takes over the whole job of data storage, such drives will form the primary means of bulk data storage.

More on Software

Generally, software is divided into two categories. These are applications software and system software. Application software is the programs you would use to get a kind of work done. Examples are WordPerfect as a word processor, Excel or Lotus as a spreadsheet, etc. These are software packages that are interacted with directly by the user or operator of the computer. Enormous work is done to continually write programs for ever-larger applications programs. Programmers can specialize in applications of a specific nature, such as for banking, etc.

Systems software is used by the computer itself for its own management, or to support the application software. An example of this is the DOS used in personal computers. The computer itself consists of hardware and some small amount of programming in ROM, but to fully support an application such as WordPerfect, that is, to drive the screen images, work with the keyboard or mouse, save data on disks, and generate printouts, the application needs to ask DOS for help from the hardware. So the system software consists of the operating system itself, a wide variety of support utility programs, and programming support in the form of compilers for different languages.

8

So, who does the programming? The first programmer is considered to be Ada Augusta Byron, and she has a language (ADA) named after her. The first machine that could actually follow a program was the EDSAC developed John Von Neumann in the late 1940's. Von Neumann developed what we now call the stored program concept. While previous machines such as ENIAC simply took one piece of data at a time, processed it, and returned it to the world before taking a second piece, the EDSAC took in a large amount of data, processed it all automatically, and then returned the entire result. While ENIAC's problems that it was to solve were defined by miles of patchplug wiring that had to be removed and inserted for each problem, EDSAC used the same mechanism for storage of data for the storage of coded steps that the machine was to follow automatically to process the data. Thus, each step became an instruction and the instructions together as a group formed a program. The program was stored in the same automatically accessible mechanism as the data it was to process. Arranging the steps that the computer will take to logically solve a program, in the manner of Ada Byron, is called programming.

The job of programming goes to people of different skill levels and experience. Some choose to specialize in applications while others choose systems. Typically individuals become expert in some particular language or system architecture, and that will define their careers. Generally the beginning programmer, with community college or similar training, will start as a coding clerk where the primary function is mastering a particular computer language and getting the know the system in use. The programmer then will write programs to solve problems. In some cases the problem is big or complex enough that a specialist is needed to lay out the plan for the programmer and this person is called a system analyst. It should be noted just what a system analyst is. This is a person that is an expert in some field such accounting, aircraft design, environmental sciences, etc., who also is knowledgeable in computers and how they can be used as a tool to solve problems. The analyst is the technical expert of the particular project, and also knows computers well enough to guide others in the work of programming the various project parts.

An end user is a person who is the last one in the food chain in the writing and marketing of software. If you use WordPerfect to write a term paper, then you are the end-user as far as Word Perfect is concerned. As such, you have significant importance and clout. End users can dictate to a certain extent what products survive and what products fail. The acceptance of WordPerfect in the marketplace is a classic example of a product that was at the right place at the right time and caught the public's fancy.

Specialty Items

Here are a couple of terms with which you can astound your friends and family.

• Multitasking is defined as the ability of a computer system to execute what appears to be several programs at the same time. Although this is not really what the computer does, it does switch between tasks so fast that it appears to be several computers instead of just one. Windows 2000 and XP and Unix are multitasking systems. It's a hardware/software combination.

• Timesharing is the apparent use of a system by several people at the same time. The classic example is a mainframe or large minicomputer into which many terminals with screens and keyboards are connected to the CPU. The system gives each person at a terminal a slice of time ("timeslicing") for their processing, after which the system moves on to the next user for his/her timeslice. This technique makes use of multiprogramming as it attempts to serve all the attached users who may be doing different things.

9

• Front-end Processors are computers that do initial processing and data form conversion before sending the concentrated data to a bigger faster system. Examples include a supercomputer that accepts input only from a machine that builds problems for it to solve, or a communications processor that provides a filtering for communications protocols that would only slow down the primary system.

• Embedded Processors are computers or microprocessors embedded within a larger system. These provide intelligence and control at a local level. Flight control computers with an aircraft cockpit are examples, or the processor that controls the firing of sparkplugs in an automobile engine.

10

SECTION 2 - Hardware, Part 1 Numbering Systems and Codes

When working with computers it is necessary to deal with numbering systems other than the decimal system. While the decimal system has served mankind well for thousands of years, it is not easily adapted to electronics. The primary numbering system used in digital systems is binary, with the octal and hexadecimal systems going along for the ride.

The reason for the use of the binary system is that each position of magnitude can have only two possible values, 0 and 1. It happens that in the laws of physics and nature the two-state condition is easiest to implement. Switches can be open or closed; current can be flowing or not flowing; current can be traveling left to right, or right to left within a wire; magnetic lines of force can be clockwise or counterclockwise around a core; lamps can be on or off. Computers make great use of circuits called bistable multivibrators, or flip-flops, that are stabile in two different electrical conditions. So, it is extremely easy to implement the binary system in electronic devices.

By contrast, the decimal system can have ten values or symbols in each position: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. It would require a device with ten different stabile states to directly implement a purely decimal computer.

The binary system is based on the powers of the number 2, starting with 20, which is equal to the number 1 in decimal (all variables raised to the 0th power are equal to 1). The next order or position of magnitude is 21, equal to 2 (all numbers raised to the first power are equal to themselves). The same applies for the higher powers of two: 22 = 4, 23 = 8, 24 = 16, 25 = 32, etc. Notice that the decimal equivalents of the binary powers seem to be doubling as the value of the power increases by 1. A table of the first 16 powers of 2, which we will use often, would look like this.

215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20

32768 16384 8192 4096 2048 1024 512 256 128 64 32 16 8 4 2 1

If you add all of the values up in the second row of the table the total comes to 65,535, and, if you include the situation of origin 0 you have 65,536 as the largest 16-position number in binary.

Each 1 or 0 that can occur in a binary number position is called a bit, which is short for Binary Digit. Since we can have a 0 or a 1 in each of the binary power positions, we call these bit positions, and name them after the power of two for that position. So, on the extreme right of the table, we have a position that represents 20, and we call it "bit position 0". The next position to the left represents 21, so we call this "bit position 1". Similarly, at the far left end of the table we have "bit position 15". The use of the term "bit position" becomes important in programming and in dealing with computer hardware-software interaction, where the bit positions represent locations with an 8-bit byte or 16-bit word, and we are interested in the whether a specific bit position contains a 1 or 0.

When dealing with actual numeric values, it is convenient to understand the relationship between decimal and binary. From the table you can see that there is a decimal equivalent value for each bit position that corresponds to the power of two for that position. If we wish to convert a binary number to decimal, we simply add up all the decimal equivalents for those bit positions that contain binary 1's, and ignore those that contain 0's.

11

BIT POSITIONS 7 6 5 4 3 2 1 0 DECIMAL

EQUIVALENTS 128 64 32 16 8 4 2 1

9 0 0 0 0 1 0 0 1 76 0 1 0 0 1 1 0 0 176 1 0 1 1 0 0 0 0 135 1 0 0 0 0 1 1 1

TO CALCULATE THE DECIMAL VALUE OF A BINARY NUMBER, add together the decimal values of all the bit positions that contain binary 1's.

TO CALCULATE THE BINARY VALUE OF A DECIMAL NUMBER,

1. By inspection, determine the largest power of two decimal equivalent that will successfully be subtracted from the given decimal value (successful means that the subtraction returns an answer that is either positive or equal to zero).

2. Subtract the decimal equivalent of this power from the given number. Keep record of this successful subtraction by placing a 1 in the bit position of that binary power.

3. Now try to subtract the next smaller power of two from the result of step 2. It may be too big; if so, place a 0 into the bit position for this power of two. If the subtraction is successful, place a 1 into that bit position.

4. Continue as in step 3 until all the given decimal number is used up. You should end up at 20 power.

The binary system is the basic method of all computer counting, but it generates numbers that become very wide very fast, and this leads to human error. Two other number systems have been used to generate a shorthand that makes the handling of larger numbers easier than with pure binary. These are Octal and Hexadecimal.

The octal number system is based on the number 8. There are 8 symbols possible in each magnitude position, 0, 1, 2, 3, 4, 5, 6, and 7. This system was widely used in earlier computers starting with ENIAC, but has been replaced by the hexadecimal system for the most part. This system is based on the number 16, and that means that there are 16 different symbols and values that can be placed in each magnitude position. These are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. Note that we use letters when we run out of the decimal numbers.

Unlike the decimal system, the octal and hexadecimal systems work beautifully with the binary system because their natural place to carry into the next higher magnitude position concurs with that of the binary system. Look at the locations where the octal number or the hexadecimal number rolls to a higher position, and you will see that it is in the same place as binary. Decimal values, however, do not carry at the same place. Therefore, there is a direct correlation between binary and octal or hexadecimal, but not between binary and decimal. This is why the decimal numbers entering a computer are usually immediately changed to a binary or hexadecimal value, worked with in that way by the program, and the answers returned to decimal just before they are returned to the outside world.

12

BINARY OCTAL HEXADECIMAL DECIMAL 24 23 22 21 20 81 80 161 160 101 100 16 8 4 2 1 8 1 16 1 10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 1 0 0 2 0 2 0 2 0 0 0 1 1 0 3 0 3 0 3 0 0 1 0 0 0 4 0 4 0 4 0 0 1 0 1 0 5 0 5 0 5 0 0 1 1 0 0 6 0 6 0 6 0 0 1 1 1 0 7 0 7 0 7 0 1 0 0 0 1 0 0 8 0 8 0 1 0 0 1 1 1 0 9 0 9 0 1 0 1 0 1 2 0 A 1 0 0 1 0 1 1 1 3 0 B 1 1 0 1 1 0 0 1 4 0 C 1 2 0 1 1 0 1 1 5 0 D 1 3 0 1 1 1 0 1 6 0 E 1 4 0 1 1 1 1 1 7 0 F 1 5 1 0 0 0 0 2 0 1 0 1 6 1 0 0 0 1 2 1 1 1 1 7

Coding Schemes: Given that data are stored or passed along inside a computer as binary bits, it soon became obvious that a method of organizing the bits into groups to represent letters, numbers, and special characters was needed. Although the process of calculating with binary digits is at the root of the design of the system, a great deal of data are represented as letters, not numbers. Therefore, several codes have been developed over the years to deal with letters and special characters.

Baudot Code is named after George Marie Baudot, who was a signalman in the French Army. He developed a five-bit code named after him that was the standard method of sending data between teletype machines as these became available. A teletype is essentially a mechanical typewriter and keyboard connected to a similar unit at a remote distance, and which communicated with each other by sending telegraph-like bit patterns over telegraph lines. Operating the keyboard on one terminal would send five-bit groups over the wires where they would actuate the typewriter of the remote terminal. This technique was standard for the Western Union telegraph service and others during the first half of the century.

American Standard Code for Information Interchange (ASCII) is a code that grew out of the expansion of digital devices that needed to communicate, but which needed a greater range of representabile characters than Baudot code could provide. This code was developed and agreed upon by a consortium of companies that acknowledged the need for competitors to be able to communicate. The code comes in two versions, ASCII-7 and ASCII-8. ASCII-8 is little used now, but finds use in communications overseas where technology may not be to 1999 levels. ASCII-7 is

13

the code most used by minicomputers and microprocessor-based machines, including personal computers. The seven bits of each byte used can represent 128 different combinations of printable and non-printable characters, these last used to control equipment rather than to print on them. It has many of the earmarks of the earlier Baudot code in that it grew out of the teletype paradigm. The code can represent both upper and lower case letters as well as numbers. However, the organization of the bit patterns that represent the numbers are not binary-aligned (similar to the problem with decimal discussed above). Therefore, a translation function is almost always included in programming where the data are brought into the system or sent from the system as ASCII, but is used in computations as pure binary.

IBM PC ASCII is essentially the ASCII-7 code, and the lower 128 bit combinations are used in PC I/O devices such as screens and printers as usual. However, in an 8-bit byte, the 7-bit code leaves a bit unused. In some systems, this can be used as a parity bit or ignored. However, IBM elected, when the PC was designed, to use the 8th bit to double the ASCII-7 code range (remember, adding another bit position to a binary number doubles the number of combinations you have). The new 128 characters thus created were used by IBM for characters for non-English languages, Greek letters and symbols for mathematics and science, and simple graphics to form borders and grids on screen displays. Although computer graphics has gone long beyond this phase of display, the PC ASCII code is used as a reference for simple text work.

Extended Binary Coded Decimal Interchange Code (EBCDIC) was first used by IBM when they introduced System/360 in 1964. This code uses all 8 bits of the byte for 256 combinations that represent letters, numbers, and various control functions. The decimal equivalent numbers are binary aligned such that EBCDIC numbers coming from an input device can be directly fed to the processor and into computations without translation. The code is based on an earlier Binary Coded Decimal (BCD), which was the code used in earlier IBM products such as the 1401. This code was a 6-bit code in which the decimal numbers that were to be involved in calculations were binary-aligned.

There are a variety of other coding systems, and the internal workings of each processor uses the bits of bytes and words in many different ways. Watch for variations on these themes.

Error Checking is used extensively in computers to make sure that the answers you are getting are correct. The validity of the data and the results of the computations overall is referred to as Data Integrity. Are the answers really correct? Error checking can be done with hardware and software. Usually a system has several different implementations of both to ensure integrity.

Vertical Redundancy Checking (VRC) or Parity Checking is a means of counting bits that are set to a 1 in every byte or word of a data stream. Suppose a magnetic tape drive is reading a record of tape. The bits of the bytes of data on the tape stretch across the tape width (vertically), not along the tape (horizontally). As each byte reaches the processor, the number of 1's in it are counted. We are not interested in the binary value of the byte, but rather whether or not an even or odd number of bit positions contain 1's. In an Odd Parity system, we want the number of 1's to be odd, that is one bit, three bits, five bits, etc. If we count them and find that there are an even number of bits set to 1 (no bits, two bits, four bits, six bits), we turn on a special bit called the Parity Bit or Check Bit to ensure that the number of bits in the byte is odd. If we find that the number of bits is already odd, we leave the parity bit off. So, we have a bit that goes along with every byte that contains no value, but is used to ensure parity. An Even Parity system works similarly, except that we use the parity bit to ensure that the total number of bits in the byte is even.

14

Longitudinal Redundancy Checking (LRC) is a similar checking system that counts the numbers of bits set to 1 horizontally along the data stream. We are not interested in each byte, but rather in one of the bit positions of all bytes, say bit position 3. As bytes pass by, we add up all the bits in bit position 3 that pass. We do this also for the other bit positions as well. At the end, we have a character that represents the summation of all bits in each bit position for that data stream. This character, called the LRC Character is written on the tape or follows the data transmission from the sending end to the receiving end. As the data arrived at its destination, a similar LRC characters was gathered. The two are compared, and if they match we assume that the transmission or tape reading was OK. If they do not match, we assume a reading or transmission error.

The problem with both of these methods is that if an even number of bits is picked up or dropped during the transmission or reading of the media, it is possible for errors to go undetected. Therefore, on magnetic recording systems and some networking, a Cyclic Redundancy Check (CRC) is used. The CRC check character is gathered similarly to the LRC character above, but it is processed by a shift-and-add algorithm rather than be simple addition. The result may be more than one check character in length. When all three check methods are used and no errors are found, the assumption is that the data are clean.

The Central Processing Unit

The Central Processing Unit (CPU) is the heart of the computer system. It contains all the circuitry necessary to interpret programs that define logical processes that the human programmer wants to do. It consists primarily of electronics which implement logical statements. These statements are worked out in Boolean Algebra, a non-numeric logical algebra that defines the logical relations of values to each other.

The CPU is responsible for the interpretation of the program, and, following the instructions in the program, causes data to be moved from one functional unit to another such that the results desired by the programmer are obtained. Input data are given to the CPU, it is processed by being moved about within the CPU's functional units, where it undergoes logical or numeric changes along the way. When the processing is done, the data are returned to the human world as output data.

There are historically two designs that have been used in CPU's. The first dates from the time of John Von Neumann, and may be referred to as a "dedicated system". This system has circuitry that is dedicated to specific purposes - an adding circuit that does addition, a subtracting circuit that does subtraction, a circuit that only compares, and so on. None of the circuits are active except the one that is needed at the moment. This is wasteful of circuitry and makes the system larger and require more power.

The second type of system appeared commercially with the advent of IBM's System/360 in 1964. This system may be defined as "non-dedicated". The individual circuits needed for discrete functions in the earlier machines were replaced by a single multipurpose circuit that could act like any of them depending on what it was told to do. This circuit was called the Arithmetic Logic Unit (ALU). It could act like an adder, a subtractor, a comparator, or any of several other functions based on what it was told to do.

A block diagram of a modern CPU includes the following functional units:

• Registers: Registers are groups of circuits called bistable multivibrators, or flip-flops, for short. These are circuits made of pairs of transistors that have the ability to remain stabile in

15

one of two logical states. They can be said to contain a binary 0 or 1 at any specific time. Groups of flip-flops can be used to store data quantities for a short period of time within the CPU; 8 flip-flops could store one byte, and 16 could store one word.

• General Purpose Registers (GPR) are groups of flip-flops that act to hold bytes or words for a period of time. Unlike most registers in the system, these registers are visible to the programmer as he/she writes the instructions to implement the program. The programmer can refer to these registers in the program and put data into, and take data out of, them at any time.

• The Arithmetic Logic Unit (ALU): This unit has responsibility for all of the arithmetic and logical functions of the system. It is composed of one fairly complicated circuit that can act like any of several types of mathematical or logical circuits depending on what it is directed to do. This device has no storage capability, that is, it does not act like a register or memory device. It introduces a small delay as the data passes through it, called transient response time.

• The Instruction Register receives the incoming instruction and holds it for the duration of a machine cycle or longer. It makes the instruction available to the system, particularly the Control Unit.

• The Program Counter is a register that keeps track of the location of the next instruction to be processed after the current one is finished. It contains memory addresses in binary form.

• The Control Unit (CU) accepts the instruction from the Instruction Register and, combining the instruction with timing cycles, causes the various functional units of the CPU to act like sources or destinations for data. The data moving between these sources and destinations may be processed on the way by moving through the ALU.

• System Clock: The system clock is a timing cycle generator that creates voltage waves of varying periods and duration which are used to synchronize the passage data between functional units.

• Input/Output System: This system provides the means by which input data and instructions enter the system, and output data leaves the system. Remember that in a Von Neumann machine, the data and the instructions that direct its processing sit side-by-side in the same memory device.

• Main Memory is contained within the CPU, and stores the data and instructions currently needed by the program execution. The speed with which the memory and the rest of the system communicate is a critical issue, and the center of much development. This may also be called Primary Storage.

Here is more detail on some of these items.

The Control Unit has undergone major design changes over the years. The current procedure is to make the CU essentially a computer within a computer. Just like the CPU has I/O devices between which it can move data, the CU treats the functional units of the CPU as sources and destinations. The CU takes the instruction from the Instruction Register and the timing cycles from the System Clock. It combines these by stepping through what amounts to be microinstructions contained within its own circuitry. By following the microinstruction pattern built into itself for a given instruction, the CU implements the instruction desired by the programmer by moving data between and through the various functional units in step with the system clock. The effect is one of doing the required instruction as far as the outside world can see.

An example would be the process of executing an Add instruction. The programmer writes an Add instruction along with additional information such as where the two data items are in the system that are to be added. Given this as a starting point, the CU starts to follow its own set of microinstructions to find the two data items, pass them through the ALU to accomplish the Add, and catch the sum at

16

the output of the ALU. It then returns the sum to a functional unit such as a register to hold the answer for the next instruction.

Because the sequence of events in the earlier dedicated systems operated at the binary level, and because the programmer and technician originally could work directly with the circuitry either from the front panel via lights and switches or via a program, the lowest or binary level of programming became known as Machine Language (ML). With the advent of the microprogrammed Control Unit, the instructions contained within the CU became known as microprogramming or microcode. This means that currently, the Machine Language which is the lowest level that the programmer can see is made up of microprogram instructions or steps. The technician can work at the microprogram level, but the programmer typically would not. When programming logic and instructions are embedded permanently into the circuitry of a device, it is referred to as firmware.

The System Clock is a timing signal generator that creates a variety of voltage waveforms used to synchronize the passage of data through the functional units. There are two types of electronics or logic in the system, synchronous and asynchronous. The word synchronous means "in step with the passage of time", while asynchronous means "not in step with the passage of time. Synchronous circuitry is that which has a clock timing signal of some kind involved with it. GPRs, for example, are synchronous, because the accept data into themselves at a particular moment or clock time. The ALU is an asynchronous circuit - it doesn't store data, it passes it through as quickly as possible and does not rely on a clock signal to do so.

In synchronous systems, the system timing is divided into regular intervals of time called Machine Cycles. All system activity is based on the elapse of the machine cycles. These are further roughly divided into two types of cycles. These are Instruction Cycles, or I-Cycles, and Execution Cycles, or E-Cycles. Instruction cycles are those that are responsible for obtaining an instruction from the main memory, placing it into the Instruction Register, and starting the CU's process of analyzing the machine language instruction to determine what microprogram to execute. By the time the I-Cycles are completed, the instruction is ready to execute, and the system already knows where the next instruction to execute will be found in main memory after the current instruction is completed. E-Cycles have the responsibility of actually causing the instruction to be accomplished. It involves a series of microinstructions that move data around between the functional units of the system so that the desired result is achieved. E-Cycles must recognized when the instruction has run to completion, and hand off the system to the I-Cycles again for the next instruction.

In modern systems, including those based on the microprocessor device, these cycles can be overlapped. The I-Cycles for instruction number 2 are getting underway while the E-Cycles for instruction number 1 are being performed. This is a simple example of parallel computing.

Main memory or primary storage is tightly connected to the dataflow of the CPU. It is a primary source for instructions and data needed for program execution and a primary destination for result data. The programmer really has little else to specify other than a main memory location or a GPR for the most part.

Data are stored in main memory at locations called addresses. Each address can contain one or more bytes of data. If the smallest lump of data that can be referred to with a single address is a byte, then the machine is referred to as byte-addressable. If the smallest lump of data that a single address can refer to is a two-byte word, then the system is called word-addressable. Some special purpose devices can use an address to refer to a single binary bit within a byte in the memory. These machines are called bit addressable.

17

The number of addresses and therefore the number of storage locations a memory system can have is determined by the width of the address bus of the CPU. This total number of addresses is called the address space. The address bus is a set of parallel wires that distribute binary 1's and 0's to the memory system in a synchronous manner. Each additional bit of width given to the address bus doubles the size of the memory possible. An address bit one bit wide, A0, could specify one of two addresses, number 0 and number 1 (remember there are two states possible for a binary bit). If the address bus were two bits wide, A0 and A1, then there would be four addresses in the memory system because there are four possible combinations of two bits: A0=0, A1=0; A0=1, A1=0; A0=0, A1=1; and A0=1, A1=1. If we have three bits of address bus, A0, A1, and A2, then we would have 8 addresses possible, and so on. Following this plan, what would be the address space for address bus widths of 16, 20, 24, and 32 bits?

The interaction of the main memory to the rest of the CPU is a critical factor in the overall performance of a computer. Typically, when core storage was used, the speed of the core system was slow enough compared to the speed of the electronic circuitry that the electronics had to wait for the memory to respond to a request. The CPU would add machine cycles of wasted time, (in microprocessors, called wait states), to slow the circuitry down and give the memory time to respond. With the advent of microprocessors and solid state memory, we still have this problem because the speed of the microprocessor device is still significantly greater than that of the main memory connected to it. We overcome this problem by the addition of a cache memory. The cache is a small amount of high speed memory that is able to keep up with the processor with no waiting. It interfaces the processor at its speed to the main memory at the slower speed. This is tricky to do, and there are variety of cache controller devices and methods that are currently in use to make this process as efficient as possible. With a good cache controller, it is possible for the memory to have the needed data or instruction information available to the processor about 99% of the time. This is called a hit rate.

Currently, there are two fields of processor design of which you should be aware. The first is called the Complex Instruction Set (CISC) approach. This is the traditional mainframe approach, and the System/360 was famous for it. The CISC machine uses complex instructions to do its work. One instruction might cause one number to be incremented, another to be decremented, the two results compared, and a change in execution direction (jump or branch) depending upon whether the two numbers are equal. That is an great amount of work for one instruction to do, but it is fairly easy to implement since by the use of microprogramming, it is an easy task to simply connect microroutines together to accomplish it. Such systems can be rather slow in execution, but are easy to program because the have what we call a "rich" instruction set. Most microprocessors including the Intel machines are of this type.

Reduced Instruction Set (RISC) computers are designed in the reverse. These machines have a small number of simple instructions, but they execute very, very fast. Their electronics is hardwired or dedicated, as opposed to microprogrammed. The results of the complex instructions can be obtained by writing routines that implement the logic of the complex instruction using the small instructions. The result is that a RISC machine can execute at an overall faster rate, even though it seems to be doing more instructions to get the result. Various tricks with clock distribution, internal pipelining, and similar approaches are also used in the RISC design to further improve the throughput. RISC machines are finding use as large workstations for CAD, design, engineering, and related uses.

In a CISC machine, the ultimate throughput depends on how fast the ALU can be supported by the rest of the circuitry. Indeed, no matter how fast the support electronics are, if the machine has only

18

one ALU, then it can execute only one instruction at a time. Parallel Processing involves a design in which there may be more than one ALU. This would allow more than one thing to be processed at one time, thereby increasing the performance of the system. The parallelism is not limited to just the ALU. It is possible to have more than one set of GPR's, data paths, and I/O paths as well. The primary difference between the Intel '386, '486, and Pentium is their internal architecture that uses ever-increasing amounts of parallel functional units to increase throughput.

Another method of increasing throughput is by use of a Coprocessor. This device acts as a parasite on, and in concert with, the main processor. It cannot operate by itself. It uses bus access and control signals that are common to the primary processor at will. The coprocessor is designed to do a specific set of small tasks, but to do them very fast. The best example is the 8087 math coprocessor from Intel, that works in concert with the 8086 processor. The 8087 has an additional set of instructions that it can perform that is over and above the instruction set of the main processor. As the instructions enter the processor and coprocessor together, the 8087 watches for one of the instructions that belongs to its set. When such an instruction comes along, the 8086 hands control of the system over to the 8087 for it to do its thing. When the instruction or instruction stream is complete, the control passes back to the 8086 again. The 8087 can deal with floating point numbers and very large numbers that would take the 8086 much longer to process.

Peripheral Devices, Character-based

Peripheral devices are those that support the processor to deliver data to the processor, take results away, or store data and instructions so that they can be accessed by the processor at any time. In this section we will discuss those peripheral devices that are primarily character-based, that is, they deal with data one character or byte at a time.

Source documents are those documents that come from the human world to the computer. They can be order sheets, sales tags, handwritten receipts, or an infinite number of similar things. They are empirical; that means that they are gathered at the source of the related activity, which may be miles from the nearest PC. Computer-usable documents are those pieces of paper or media that can be accessed by the computer's input/output devices without need for further preprocessing. These include the venerable punched card, optically read documents, magnetic stripe credit cards, or keyboard entry. In the old days, there was a major conversion that had to occur to make the source documents computer-usable. Traditionally, the source documents were brought to the computer site where they were read by a keypunch operator who generated a deck of punched cards that represented the data on the source items. This step consumed time and money. Therefore, a great variety of data entry techniques have been developed to eliminate the translation process. Credit cards with magnetic stripes, optically read lotto tickets, laser scanned canned goods and potato chip packages name just a few.

Early methods of generating computer-usable documents centered around punching holes in things. These included the Hollerith punched card and paper tape, which was used in teletype systems and various early data recording methods. The punched card had twelve rows for holes named 12, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, from the top down. The card was divided into 80 columns, left to right. The top three rows were called zones, and the bottom rows were called digits. If the area or field, or group of columns, of the card being discussed contained numeric data, then the 0 row was considered a number 0. If the field of the card being discussed contained alphabetic or alphanumeric information, the 0 row was considered a zone. The three zones could represent thirds of the alphabet, so that a punched card could contain numbers, upper case letters, or a few special characters. The

19

holes were placed into the card by a keypunch machine, that was an electromechanical device with a keyboard and card path through which the cards passed to be punched or read.

Using the keyboard as the human interface, other key-entry machines have come and gone. These include key-to-tape devices in which key data was written into 80-column records on magnetic tape (to match the organization of the punched card). They also include stations for key-to-disk and key-to-diskette entry. The first took information from the keyboard and placed it onto a fixed disk, while the second placed the data on a floppy diskette. Today we normally assume that a PC or PC-like station will be the entry point and that it will be connected to the computer by a network of some kind.

Other types of character entry besides keyboards include the mouse, a small movable device whose position on the table is represented by a pointer on the screen; Optical Character Recognition (OCR) devices that read printed characters from a medium by doing pattern recognition; Magnetic Ink Character Recognition (MICR) used in the banking industry to encode values on checks; Light Pens that are used to indicate a certain point on the screen to which the user wishes to call the computer's attention; Touch Panels which can receive input in the form of a person's finger touching a point on the screen; Bar Codes which are scanned by laser to generate a pattern of 1's and 0's that can be interpreted as binary data; Point-of-Sale (POS) terminals which act like computerized cash registers and checkout stands, where commercial selling is done, and which might have other I/O devices like laser scanners included within them; and Voice Recognition and generation which attempts to communicate with the user by the spoken word.

The word terminal refers to any of a wide variety of keyboard-plus-display machines that can interact with a user on behalf of a computer. The earliest was the teletype, which could display information to the user by printing it on paper. Video display terminals came into their own only in the early 1970's, because the semiconductor memory devices needed to store the image for the screen were not plentiful until that time. We now have completely intelligent terminals such as the PC that can do their own processing most of the time, and need to communicate only at certain times.

Video Display devices are those that can provide a text or graphic image on a surface, usually a Cathode Ray Tube (CRT). The text image is stored as ASCII data, and a refresh circuit circulates through the storage device, or buffer, so many times a second to generate the image on the screen. The circulation of data from the buffer is synchronized with the vertical and horizontal timing of the raster on the display tube so that a stabile display of letters is produced.

In graphics displays, the field of the CRT's screen is divided into picture elements or pixels. A pixel is a dot of light on the screen, or the place where such a dot of light can be. Resolution is a word the indicates the number of pixels horizontally across the screen, and number of pixels vertically down the screen, that a give video display can produce. A display with a Video Graphics Array (VGA) image will have 640 pixels horizontally and 480 vertically. This is a standard reference value in current PC technology.

Each pixel has certain characteristics. These include its size, or dot pitch, which is a function of the manufacturing process of the CRT, and is the diameter of the dot of light in millimeters (e.g. 0.28mm dot pitch), and the number of colors it can represent. This is determined by the video display adapter to which the display itself is attached. It is important to make sure that the display itself and the adapter to which it is to be attached are compatible in sweep speeds and interfacing. It is possible to damage a display if it is connected to an incompatible adapter.

20

Displays that can generate only one color are called monochrome displays, while those that do colors are called color or polychrome displays. The early PC had two monitors and display adapters available. The monochrome adapter and display (MCA) generated a green image with a characteristic character shape that is still with us, but this display was a higher resolution than a regular television and therefore was not compatible with television monitors or standards. It was modeled after IBM's mainframe display device, the 3270. The Color Graphics Adapter (CGA) and display was designed to be NTSC compatible so that a buyer could use a color home television as a display. The resolution was poor, but the device could generate graphics and color reasonably well. In 1984, IBM introduced the Enhanced Graphics Array (EGA) with the PC/AT. This allowed the resolution of the MCA device to be viewed in color. The issue of the use of the home television had by this time become unimportant. In 1987, IBM introduced the PS/2 product line and with it the VGA device. This set a baseline standard for display resolution and performance. The Extended Graphics Array (XGA) was an attempt by IBM to define a standard for higher-than-VGA resolutions, but most makers did not adhere to the specifications. The Super VGA (SVGA) was proposed instead, with a pixel map of 800 (h) x 600 (v) pixels. This has been adapted since by all makers including IBM, but it was never a fully agreed-upon standard.

Liquid Crystal Displays (LCD) and their derivative, the Active Matrix Display, also called the Thin-Film Transistor Display (TFTD) make use of a liquid crystal sandwiched between two pieces of glass that have been coated with conductive transparent oxides. By controlling the voltage between the two pieces of glass, the crystal liquid will turn either opaque (no light passes through) or transparent (light passes through). By installing a transistor at each pixel location on the glass, the TFTD can increase the contrast ratio of the opacity of the crystal, generating a clearer, crisper image that changes instantly instead of slowly as does the LCD.

The term printer refers to a variety of devices that place characters on a receiving medium. The methods of doing this come under the categories of impact printing and non-impact printing.

Impact printing has its beginnings in the press of Gutenburg, which used fonts, or the shape of the desired character, carved into blocks of wood in high relief. Each letter had to be carved by hand. The letters were placed together in a frame so that they were compressed on all sides and did not move. The frame was placed onto a rolling carriage and ink was spread onto the tops of the fonts. Paper was then placed onto the fonts; the carriage was rolled under a heavy metal plate, or "platen", which was pressed down onto the back side of the paper. The ink on the fonts was thus transferred onto the paper in the shape of the fonts. While this method used pressure rather than a fast-moving impact, it was nonetheless the beginnings of mechanical printing as we know it.

Today, a wide variety of printers use some variation of this process. They all have these five things in common:

1. The character shape, or font, which can be carved or cast from metal or plastic, or formed by a pattern of dots. The term "font" also applies to a space where a character could be but is not.

2. Paper, or the medium to which the coloring element or ink is transferred. Paper is the most common, but printing can be done on plastic, metal, wood, or just about any surface.

3. Ink, usually found in the form of a ribbon that has been saturated with ink. Ribbons can be made of many fibers, but the standard now is nylon.

4. The platen, or some related device that provides a backstopping action to the printing movement.

5. Physical motion, which brings all of the above together with sufficient pressure or force to cause the ink to be transferred to the receiving medium.

21

Almost always we will find that two or more of the five elements of impact printing are combined into one physical mechanism. Examples include:

• Typewriter, in which the font, in the form of a cast slug of metal on the end of an arm, is thrown toward the ribbon so that it impacts the ribbon to transfer the ink to the paper. The rubber roller around which the paper wraps is called the platen, and it serves to backstop the flying key. The font and physical motion are combined into one mechanism.

• Dot matrix printer, similar to those found in the student labs. In this case, the font consists of a dot pattern that is formed by striking the ribbon against the paper with the ends of a set of wires that are electro-mechanically moved forward, then retracted, at high speed. The font and physical motion are represented by the print head with the wires inside. The platen is a small smooth metal piece behind the paper.

• Drum printers are fund on larger systems and have largely been replaced by large laser printers. They have a metal drum whose surface is covered with fonts in lines and circles such that each time the drum makes one complete revolution, every possible font is exposed to every possible character position. The paper is pushed from behind by a hammer mechanism that causes the paper to move forward and be pinched between drum and the ribbon. The combination here is the platen, formed by the drum, and fonts on it.

• Chain and Train printers work similarly to drum printers. However, instead of a drum with characters on it, the fonts are made on metal slugs that travel around in an oval on a race track. The chain, where the slugs are hooked together, or the train, where they are not hooked but push each other around, spins across the width of the paper. A hammer mechanism for each print position fires from behind the paper to press the paper against the ribbon on the other side which is then pressed against the font as it passed by. This method combines font and platen.

Non-impact printing uses modern techniques to form characters of a contrasting color on a medium. There have been many non-impact printing methods over the years; the three most common now are thermal printing, where heat is used to form the characters; optical printing, where light is used; and ink jet printing where ink is simply sprayed onto the paper.

Thermal printing involves a specially treated paper that has a light background tint, but which can turn darker with exposure to heat. The heat is often formed by a pattern of dots that is created as a print head passes slowly over the paper surface. The print head consists of a row of diodes encased in glass bubbles that are turned on and off very quickly, and which can heat up or cool down almost as fast. As the glass bubbles on the printhead contact the paper, the current is turned on and then off quickly, causing the glass bubble to heat up, then cool down rapidly. This in turn causes the area of the paper which was in contact with the paper at the time of the heating to turn darker, typically either blue or black. This method is used in desk calculators and many credit card and cash register applications.

Optical printing is best illustrated by Laser Printers. These devices have a rotating drum that is covered with a cadmium sulfide compound that is sensitive to light. When light shines onto the drum, the surface that the light impinges is made to be electrostatically charged. As the drum turns, the charged area is then exposed to a very fine black powder or toner, which sticks to the areas where the charge was placed. This area is then further rotated to a point where the toner is transferred to a piece of paper as the two are pressed together. Finally, the paper is heated as it exits the machine to seal the ink into the paper. The character shapes can be drawn onto the rotating drum by a focused laser bean, and this beam can be steered to create the desired pattern of dots. The characters are not whole font - they are formed by very small dot patterns, typically in a resolution of 300 x300 dots per inch.

22

Ink jet printing involves the spraying of minute ink droplets onto the paper as a spray nozzle moves across the width of the page. The ink is pumped under pressure to a nozzle that generates a very fine stream. This stream is passed through electrodes that are charged with an ultrasonic signal so that the stream becomes a stream of tiny droplets. These are further "steered" by more electrodes to guide the droplets up or down as the print head makes its excursion. The result is a finely generated printing that can come in colors and do excellent graphics.

Plotters are large printers that generate drawings or graphics as opposed to print. Pen plotters use real ink pens in varying sizes and widths that are moved over the paper in an X-Y fashion to generate the desired line drawing. These can be very fast, but have certain limitations on accuracy and resolution. Photoplotters are essentially giant laser printers (although the original ones did not use lasers) which use a dot matrix of light points to generate a high-contrast pattern on film. These are used to create printed circuit boards and integrated circuit device masks.

A few more terms to round out the printer discussion:

• Paper Feed techniques include methods of moving paper through a printing mechanism. The most common form is called a pressure roll or pressure platen technique, in which the paper is pinched between two rubber rollers or a roller and a platen which is then rotated to move the paper. Single sheet or cut sheet paper is most frequently used in these machines. Tractor feed is used in high speed paper motion to move the paper by tractor pins that pass through holes along the edge of the paper so that the paper is mechanically positively moved. Paper used in this type of machine is called continuous forms.

• Dot Matrix Printing indicates that the form of the character is made up of a pattern of dots in an X-Y arrangement rather than by complete unbroken lines. Most printers today use this technique, as the resolution improves to the point where it is hard to tell the real thing from the dot pattern.

• Near Letter Quality (NLQ) is a term that is used to indicate a dot matrix printed output that is very close in quality to the results that could be obtained by whole-character, that is impact, printing.

23

SECTION 3 - Hardware, Part 2

Data Storage Organization

When confronted with storing data, and particularly large amounts of data, it is necessary to organize the bytes of information in a way that makes sense to the nature of the data, and also to the mechanism in which the data are being stored. The user wants to see the information in such a way that makes sense to him or her. For instance, if the user wishes to keep a name and address list of club members, the interaction between the user and the computer should be in a way that makes sense to the nature of the list. That is, to add a new member to the list, for example, the user would enter the member's name on the first line of a screen, the first address line on the second line of the screen, and the city, state and zip code on the third line. This would match a typical hand-addressed envelope. The data, however, is not stored in handwritten form, but as bytes on magnetic disk. The disk drive, being a mechanical device, has certain characteristics and limitations that must be met if it is to be useful. The data must therefore be converted to a different organization than the simple three lines when it is sent from the screen/keyboard of the system to the drive. It must be reorganized to fit the limitations of the disk drive device. Also, when data are retrieved from the drive later, it must be converted from a disk drive organization to a different organization that better fits the understanding of how the user will deal with it. It is up to the computer and the operating system between the user and the drive to make these organization conversions.

The original standard for data organization was the Hollerith punched card. This piece of stiff paper was arranged as 80 vertical columns of twelve horizontal rows each. The card could therefore contain as many as 80 letters or numbers. Because it was of fixed length, the card and the 80-character grouping were referred to as a unit record. The gray covered electromechanical machines used to read, punch, process, and print the data on the cards were called unit record machines. From the inception of IBM up through the mid-1950's, unit record machines were the mainstay of the data processing industry.

In the early days of computers, the group of 80 characters was maintained as a reference quantity of data. The record, as a standard unit of data, was composed of one or more fields, which in turn were composed of one or more characters. An example would be a punched card or computer record that contained the entry of one person's name in a list of names for a club membership. The first field might contain the member's name, and be 20 characters long. The second field could be an address of 20 characters; the third might be a city name of 15 characters; the fourth field might be a state code of 2 characters; and the next field might be a ZIP code of 5 characters. Together, these 62 characters make up one member's address for the club roster.

Notice that in the membership list, the fields each represent one part of an address or identify the person. Together, all the fields make up a record that identifies the person and his/her address. The organization of these data must make sense to the user, the person working with the information. It is organized as one might arrange a holiday card list or other simple mailing list. It makes sense to the user to organize the information this way.

However, the storage device design is such that it doesn't know about the nature of the data; indeed, disk drives are dumb devices. The disk drive's electronics know only how to find tracks and sectors. So, in between the user at the keyboard and screen and the disk drive is the computer along with the operating system that together rearrange the organization of the data as they pass between the source and destination. If the data are going to the disk from the screen/keyboard, then the data are taken out of the mailing list organization described above which made sense to the user and arranged into

24

sectors so that the data can be written on the disk surface. When the data are retrieved, the computer and operating system reorganize the data in reverse. A major amount of work is done to accomplish these conversions. A significant amount of the operating system is dedicated to disk handling.

As technology progressed away from the punched card to screen and keyboard data entry, the unit record gave way to a more general arrangement of data coming from the source. The idea of characters, fields, and records remained. In addition, the records collectively were grouped into a file. So a file is one or more records of data. The number of characters in a field, and therefore in a record and file, can now be variable; we no longer need to deal with a fixed length of 80 characters. Programmers today deal with data storage in an infinite number of ways that make sense to the nature of the data being stored, be it accounting data, scientific data, school records, or word processing documents. However, when the data are sent to the disk, the system hardware and software must make the conversion to the arrangement that the disk drive can accommodate. The file is the unit that appears in the directory of a disk drive. If you issue the DIR command at a DOS prompt on a PC, the listing you get is at the file level. It is assumed that each of the files listed contain one or more records made up of one or more fields that are made up of one or more characters. To see what is inside a file, you must execute some sort of program or DOS command that will show the file to you.

A database is made up of one or more files that contain data of a related nature. Again, just how the data are arranged between these files is up to the programmer who creates a file set that make sense to the nature of the data and the nature of the use to which it will be applied. One of the files usually contains either all the data as a base reference, or, if not all the data, at least the essential data against which the other files may be referenced. This most important file is called a master file.

Fields come in different types, too. First, there is a key field, which is regarded in the database as being the one first looked at by the program that is using the data. For instance, to prepare the monthly meeting notice for the club membership, the corresponding secretary might define the ZIP code field of the mailing list records as the most important. This is because when mailing large numbers of fliers, the post office will charge less per piece if they are presorted in ZIP code order. When preparing the list for printing, sorting the records into ZIP code order via the key field will save the club mailing costs.

Fields can be described by the nature of the data they contain. Fields which contain only alphabetic letters are called alphabetic fields; those that contain only numbers are called numeric fields; those that contain a mix of letters and numbers are called alphameric or alphanumeric fields. Alphabetic fields and alphanumeric fields contain data that usually are stored as-is. Numeric fields, however, are kept pure so that their contents can go directly to a mathematical processing routine. Numeric fields can also be compressed and stored in dense form; this saves disk space if the amount of numbers to be stored is large. There is also a logical field, composed of one or more bytes, whose contents or bit positions represent answers to "yes or no" questions. For instance, it would possible to store a single byte in the record along with the name and address to indicate 8 different yes-or-no answers. These could include "has the member paid this year's dues? Yes or No", with a 1 for a yes and 0 for a no.

Records have a set of characteristics as well. The most obvious is whether the record is a fixed-length record or a variable-length record. The fixed-length record is easy to deal with since all the records are the same length. This is easily seen in the punched card, where the data were physically a fixed length as well as logically. Dealing with this kind of record is easy to do in programming. Accordingly, this type of record storage is the most common and used most of the time. Variable-length records mean that the length of the records are not a fixed size, but can be longer or shorter than the last because there is no reason to store blank characters if shorter, and there is a reason to

25

store non-blank characters if longer. Programming with variable-length records is difficult because a method must be devised to determine where one record ends and the next begins - the programmer can no longer depend on a fixed number of characters per record.

At the file level, the method of approach to accessing data in a file can take several directions. The first question is how large the body of data are that is to be stored in the file. For example, a credit card company might have millions of customers, many of whom have multiple cards. How do you find the one client's records in all those million? Several ways of storing the data within the file address this question.

The simplest and most obvious way of storing data are the sequential file. This file contains records in an order sorted by a key field within the records. Again, a file sorted to ZIP code order is a good example. To find the address of a single person within the file, the program begins at the beginning, and looks at the first record to see if it is the one desired. If it is, the search is over quickly. However, if the first record of the file is not the one desired, the program then reads in the second record, and tries again. If the record we are looking for is close to the beginning of the file, it takes little time to find it. However, if the record we want is near or at the end of the file, it might take long time to go through the thousands of records we don't want to find the one we do want. This usually is unacceptable in anything other than small data sets.

An improvement on the sequential method of file access is the indexed-sequential file. This method consists of two files. The first is the large file of many records that contains all the details about each person in the club or credit card client or machined part. This file is in random order; it is not necessary to keep it organized. The only thing we need to do is to make sure that the records are filled in correctly. Then, we build a small file called the index file, which acts like the index in a textbook. At the beginning of the processing session, a pass is made through the large file, and the key fields of each record, along with the position of that record in the large file, is stored in the index file. When complete, the index file contains 2 pieces of information about each of the master file records: The key field contents, and the location of that record in the master file. When we wish to find a particular record, we look it up sequentially in the index file - this takes little time because the file is small and the entries in it are short. When we find the item we want, the entry in the index file gives us the record location for that item in the big file. So we take this information and find the item in the big file directly, that is, without going through all the entries ahead of the one we want.

Another method of finding information in a large file is called a binary search. In this method, the large file is sorted by a key field in each record. This takes time, but it puts all the records in some sort of logical ascending order. Again, the ZIP code field in a large set of records is a good example. When we wish to find a particular entry in the file, we go to the record in the middle of the file, obtain its key field data, and compare it to the one we want. If the desired data has a higher value than the record obtained from the middle of the file, we know that the one we want is in the second half of the file, above our current location. We therefore know that the desired data are not in the first half of the file. Conversely, if the desired key field data are less than that of the middle record of the file, we know that the data we want are in the first half of the file. Immediately, we have eliminated half the file as not having the data we want.

We continue with the half of the file that contains our data, and again to the middle of that group of records. Again, the item we want is either above the middle of the second half (the upper 1/4 of the file), or below the middle of the second half (the third 1/4 of the file). Similarly, we can do this "divide and conquer" over several times until we zero in on the target record. This is a very fast

26

method, and it takes the same amount of time to find any record in the file, regardless of whether the desired record is at the beginning of the file, at the end of the file, or in the middle.

A Few Words About Magnetism

A student of basic electronics or physics soon is confronted with the ideas and theories behind magnetism. Unlike electronic current flow, in which actual matter, the electron, is moving, magnetism is concerned with pure energy levels that have no weight and take no space. As such, it is sometimes difficult for the student to visualize the ideas behind it.

Electron flow in a copper wire or other conductor is the result of a pressure placed on the ends of the wire that the electrons within the wire cannot resist. The electrical pressure is called Electromotive Force, or EMF, and its unit of measure is the Volt. EMF is created by storing a bunch of electrons at one end of the wire and an bunch of positive ions, or atoms missing electrons, at the other end of the wire. The electrons in the copper atoms within the wire feel an attraction for the positive-ion end of the wire and are repelled by the end that has too many electrons already. These electrons, therefore, tend to move toward the positive end of the wire. When electrons move, it is said that we have Current flowing in the wire. The unit of measure of current is the Ampere.

Electrons are spinning about their own axes as the move along the wire. This spinning creates a magnetic field between the poles of the electron, just like the earth's magnetic field between the North and South Poles. As the electron moves along, it takes its magnetic field with it. This traveling field is the basis of the science of electromagnetics. This is the science of magnetic lines of force created by the movement of electrons. It provides us with all the theory necessary to build motors, generators, electric lights, stereo sets, radio and television, and all the goodies of the plug-in world.

Magnetism is made up of lines of magnetic force. As we said, these are pure energy, not matter in motion. It is the same basic idea of energy as the light showering down from the fluorescent tubes in the classroom ceiling. If light were matter, we would gradually fill the room with it, and we would all walk around glowing on the head and shoulders where the light had fallen. Magnetic lines of force, like electrons, travel in some materials better than others, just like electrons. Iron, nickel, cobalt, and various alloys are used to conduct lines of force. However, where electrons won't travel through things like wood, plastic, and glass, lines of force pass through these unchanged. So electrons don't flow unless they are allowed to, while lines of force flow unless they are stopped.

If we take a wire and wrap it about a core made of a magnetic substance, and then pass an electron current through the wire, the lines of force created by the moving electrons will be concentrated into the magnetic core. This in turn will tend to hold the lines, and may continue to hold some after the current is turned off. Lines of force in a core that has no current nearby is called residual magnetism.

If we take a coil of wire and connect it to a sensitive meter or measuring device, and then pass a core with residual magnetism past or through the coil, the meter will indicate that as the core passed, a current attempted to flow and an electromotive pressure was created. If a complete path from one end of the wire to the other is present, the current will indeed flow because the magnetic fields of the electrons (remember they are spinning) will interact with the passing magnetic field of the core and this will force the electrons to move - this is called motor action. If the ends of the wire coil are connected to an amplifier device, the electromotive pressure or voltage built up at its ends can be seen by the circuitry and put to use, perhaps as 1's and 0's.

27

Magnetic Data Storage

We can take advantage of these phenomena with the laws of Physics dealing with induction. Induction is the event in which magnetic lines of force created by one device can create, or induce, lines of force to flow in an nearby magnetic medium. The simple example of a permanent magnet attracting nails on a table is an example of induction - the lines of force in the permanent magnet create lines of force in the nail. If there is sufficient induction, the attraction between the two metals is so great that motor action will draw the nail toward the magnet.

You will recall that the storage of data as binary 1's and 0's is easy to do in electronics since the laws of physics governing electronics are inherently two-state in nature. The existence of lines of force is also a two-state system. The lines can either be there (1) or not (0), or, more likely, traveling clockwise (1) or counter clockwise (0) around a core or wire.

Storage of data magnetically requires two main items. The first is a medium, that is, something to store the lines of force between uses. This is usually a coating of spray paint applied to a strip of plastic, forming magnetic tape, or to a circle of plastic for a floppy diskette or a rigid metal disk for fixed disks. The coating has undergone extensive development over the years as the industry continues to cram more and more data in a smaller and smaller space, using chemistry as a tool. The coating is sprayed onto the medium and cured, polished, and honed to a smooth finish. The coating consists of extremely fine particles of magnetic elements including iron in a liquid binding agent. When dry, the surface can store lines of force in its magnetic material.

The second item is a doughnut-shaped core of magnetic material around which is wrapped a coil of wire. If a current is passed through the coil, lines of force will build up around the wire, and be captured in the material of the core. Thus, with current flowing in the coil, the core will become magnetized with flux flowing in, say, a clockwise direction around it. If the direction of current flow is then reversed, the lines of force created in the coil will reverse direction, and the lines in the core will also reverse direction. So, by passing current in one direction or the other in the coil, we can force lines of force to flow in the core in one of two directions. This device is called a magnetic head. At one point, a notch is cut into the core to force the lines to hop across through the air. This is called an air gap. It is the place that is directly opposite the medium.

To store data, we simply pass the medium by a head that has current flowing in its coil. As the magnetic surface passes the head, the lines of force jumping the air gap in the nearby head will find an easier path through the magnetic surface of the medium than it will through the air in the gap. The lines therefore pass to the medium, along it, and return to the core on the other side of the gap. This induces lines of force to be created and stored in the medium that is passing the gap. If the direction of current flow is reversed in the head winding, the lines of force jumping the gap reverse, and the induced lines captured by the passing medium are also reversed. Thus it is easy to create areas on the medium that are magnetized in one direction, then in the other. These areas are called dipoles, and creating them on a medium is called writing data.

If the head coil is removed from the current source and instead connected to the inputs of a sensitive amplifier, we can again use the laws of induction in reverse. As the medium that has dipoles on it passes the head gap, the dipoles induce magnetic lines into the head core. As the lines change, they induce a voltage or current in the head winding, and this, when amplified, can be interpreted as data. This process is called reading data. So we can use the same read-write head to both write the data onto a medium, and read it back later.

28

In reality, the act of reading is looking for the points on the track where the one dipole ends, and the next begins. At this point, the interaction of the lines of force on the medium with the head will greatest, according to the formula for induction.

Magnetic Tape

Magnetic tape consists of a ribbon of plastic upon which the magnetic coating is sprayed. The plastic can be any of several types, with polyvinyl and polyurethane being common. The standard width for most tapes today is 0.5", and smaller sizes down to 8 mm are available for cassettes and other data storage devices. The standard thickness is 0.5 mil, that is, 1/2 of a 1/000 of an inch. A standard 10-inch reel will contain 2,400 feet of tape.

The coating for tape must be flexible, since the tape makes many twists and turns as it passes through a drive. The coating must also be as resistant to friction and wear as possible, since the tape touches not only itself on the reel, but also the head metal and other guides and rollers. The use of tape assumes a contact between the tape and the head mechanism. This contact can cause wear and flaking of the coating as the tape deteriorates with age. As data are sometimes stored for long periods on tape, it is common to find an specially prepared storage room with temperature and humidity control to minimize the aging process.

Data on tape is arranged in tracks, with both 7-track and 9-track tape being common. The tracks run longitudinally along the length of the tape. They are defined by either 7 or 9 read-write heads in the head assembly across the tape width. The 7-track tape was created to store data in Standard Binary Coded Decimal, while the 9-track tape was used for EBCDIC coding. Each track represented one bit of a binary byte.

Data are arranged on tape in a parallel-by-bit, serial-by-character order. The bits of an entire byte or character are all read or written across the width of the tape at the same instant. However, fields and records of data are written one byte or character at a time along the length of the tape. Obviously, tape is a sequential access method of data storage.

The amount of data a given tape can hold are determined by several things. First, the tape or bit density defines how many characters or bytes can be written in a linear inch of tape. This is usually referred to as Bits Per Inch, abbreviated BPI. Depending on the nature of encoding used for the data, values of 200 and 556 (both obsolete), and 800, 1600, and 6250 are common. Second, the organization of the data as they are written is important. Data are written in records similar to punched cards. Between each record is an Inter-Record Gap, or IRG, that is required to allow the tape to stop and start motion between records. These gaps are typically 0.6" long. If the data are written in many short records, a good portion of the tape will be wasted as IRGs. If the data are written in long records, there will be less IRGs and more data on the tape.

Also important is the speed of the tape through the drive. This is measured in Inches Per Second, abbreviated IPS. Typical speeds have included 16.25, 37.5, 75, 112.5, 100, and 200 IPS. The data rate of a given drive may be determined by multiplying the tape density by the drive speed. This is the speed that the data have as they pass to the drive on a write or from the drive on a read.

Floppy and Fixed Disks

We have seen two means by which a specific item of data can be located in a storage device. In the case of a computer's main memory, the data are located randomly. This means that we can go directly

29

to the specific item of data without going through any other storage location to get there. We have also just seen the classic example of sequential access, the magnetic tape. On a full reel of tape, it might take considerable time to find the last record at the end of the tape if you are beginning at the start of the tape. Both techniques have their good and bad points, depending on the need of the moment.

There is a third method for data storage called Direct Access Storage Devices, abbreviated either as DASD or DASDE. DASD takes into account the best of both random and sequential access. Data are arranged on the medium in concentric circles called tracks. Since these tracks may contain a great deal of data, each track has a number of divisions called sectors, which can be thought of a slice of pie in shape. DASD devices first find the major storage portion of the device randomly by accessing any track randomly. The device can go from track 3 to track 25, then back to track 7 without having to process data in the intervening tracks in any way. The device then finds the specific item of data sequentially starting out at the beginning of the track it has found and sequentially proceeding through all the sectors until it finds the one it needs. So we combine random and sequential techniques to gain efficiency and speed of data retrieval and storage.

The general approach to data storage is the same in floppy diskette drives and fixed disk drives. Generally, data are stored in concentric tracks that can be 40 or 80 in floppy diskettes, and many thousands in fixed drives. Each track has an arbitrary starting point for its rotation called an index. Starting at the index, the data are written in a serial-by-bit, serial-by-character format along the track. This is because, unlike a tape drive head, there is only one gap in the disk drive head.

Data records are typically fixed-length, and the length corresponds to the size of the sector, or pie slice. The actual starting point of each sector around the circle can be defined by a notch or hole in the disk, and a system using this approach is called hard-sectored. More typically for PC's, the sectors' starting points are defined by a timing loop in the disk drive controller that counts microseconds from the index and compares this to the speed of rotation of the disk. Such systems are called soft-sectored. The only hole or notch is therefore the index. The sectors can be variable in number, from 8 through 27, and can contain variable amounts of bytes, from 128 on up. In addition to the data, each sector contains a header, which identifies the sector uniquely on the disk surface, and several gaps which are waste space to allow the controller electronics some time to compute and respond (remember that the disk is always turning, whereas tape can be stopped between records.

The storage of data on a disk at the request of a program using an operating system like DOS requires a great deal of further study beyond the scope of this course. However, the CT001'er will encounter one other term that should be defined. This is the cluster. Depending on the medium being used, a cluster is defined as one or more sectors taken as a logical whole. In the case of the 3.5" diskette, for example, a cluster is two sectors long. DOS never sees anything but clusters, while the drive only knows sectors. The translation between the two is up to the lower level coding in DOS and the ROM BIOS.

Another concept that is similar between floppy and fixed disks is the cylinder. A cylinder is easier to see on a fixed disk because the drive may have many platters and therefore many recording sides. The floppy can have cylinders too, but it has only two sides to work with. Basically, a cylinder is defined on a multiple-head drive as the same head position, or track, on all of the surfaces at the same time. For example, if the head mechanism on a drive is positioned at track 12, then every head on the drive can all see track 12 at the same time. This describes a vertical cylinder logically on all the media surfaces. A floppy cylinder has a maximum of two tracks, while a fixed disk can have as many tracks in a cylinder as it has surfaces.

30

There is a particular reason why cylinders are important in data storage. This has to do with seek, the process of finding a particular place on a disk to read or write. In a DASD device, seek is divided into two types, mechanical seek and electrical seek. Mechanical seek deals with the random selection of a track, as discussed above. The carriage on which the heads are mounted are moved toward or away from the center of the disk and can go from one track to any other randomly. When the carriage arrives at the target track, the mechanism waits to find the index. When the index is found, the head for the specific surface is activated and the electrical seek begins. The sequential search into the track to the desired sector is performed.

You will note that mechanical seek involves a physical motion of the heads, and will therefore take sometime to complete. Electrical seek is faster, because there is not mechanical motion of the heads. It follows that if we can minimize the electrical seeks, we can have an overall faster response of the drive. If a large amount of data are to be transferred, say more than one track's worth, then if we go from track to track to accomplish this we will have used multiple mechanical seeks and wasted time. This is called track mode. If, however, the data are arranged vertically at the same track position on every surface, then we can go from track to track vertically using only electrical seeks, and save much time. This is the standard method used by DOS for both floppy and fixed disks in current systems, and is called cylinder mode.

The primary differences between fixed and floppy disk systems center around capacity and speed. Floppy disks, like magnetic tape, are a contact medium. The head is in contact with the surface of the diskette coating anytime the diskette is rotating (red-light time). As a result, diskettes are considered consumable media. This means that they are guaranteed to wear out, and they usually do it when you are least prepared to have it happen. Hence, the habit of backup is important to develop early. Backup simple says that data and program material that you create, that are your own intellectual property, should be copied every time you make changes in it, so that if the diskette dies without warning, you have at least some of your work saved.

Because a diskette is a contact device, the of rotation of the diskette is limited. Five and one-quarter inch diskettes topped out around 350 rpm, while the 3.5" diskettes rotate at 600 rpm. The access time, essentially the time for mechanical seek, in a floppy drive is great, in the order of hundreds of milliseconds. The data density, that is, how many bytes can fit on one track, is limited by the speed of the medium, since we can't put bits too close together without them mixing.

Because fixed drives do not have their heads in contact with the medium, the speed of rotation can be much greater. Speeds of 3600, 4200, 4500, 5500, and greater are becoming common. This speed allows the data rate to be much greater for transfer between the drive and the computer. The existence of multiple surfaces allows a more efficient cylinder mode arrangement for the data.

Disk Drive Controllers

Of interest to PC enthusiasts are several controller options that are common. It is beyond the scope of this course to address all the details, but here are some essential facts for reference.

The disk controller is either a discrete card that is inserted into a slot or built onto a motherboard. It logically stands between the drive and the computer system and operating system. All the requests from DOS or BIOS to move data to or from a drive must pass through the controller. The levels of smartness built into the controller, or the lack of it, can be described by looking at the evolution of the PC.

31

PC/XT: This machine was introduced in 1982 and was a standard IBM PC with a 10-megabyte drive, a controller, and a larger power supply included. Both the controller and drive were dumb. They depended on the motherboard processor to do all their computation for them. When the drive controller card was plugged into a slot, a block of programming was inserted into the memory map in the upper 384K of the system. This code was an extension of the motherboard's ROM BIOS, and allowed the machine to deal with the larger volume of the drive. PC DOS version 2 was introduced to accompany this addition, and it provided for subdirectories, larger FAT tables, and API's. The controller was an 8-bit device, transferring single bytes at a time to the computer. The drive used an ST412/506 interface defined by Shugart.

PC/AT: This machine was introduced in August of 1984. It provided a 16-bit controller, a 20-megabyte fixed drive, and an Intel 80286 processor. Although IBM and Microsoft soon released OS/2 version 1 for the machine, it was almost exclusively used by clients as a fast PC. The drive was connected to a large controller by an ST412/506 interface. However, much more control of the drive was given to the controller to off-load the motherboard processor. Buffering was included so that data could be stored temporarily allowing less interaction and interference on the motherboard. The speed of the drive was increased as well. PC DOS 3.0 was introduced to address the directory structure of the larger drive.

INTEGRATED DRIVE ELECTRONICS (IDE): This interface is a result of the miniaturization of the traditional PC/AT interface. As semiconductor technology advanced, the drive electronics needed to move the heads and transfer data became smaller to the point of needing only a couple of high-density chips. Similarly, the electronics needed for the controller was shrinking. Eventually, it was possible to place the electronics for the drive and the electronics for the controller together on the drive itself. This made the board plugged into the slot nothing more than a connector, since all the brains were now on the drive. These drives also were smarter, further off-loading the motherboard system.

SMALL COMPUTER SYSTEMS INTERFACE (SCSI): There are a variety of more advanced interfaces that have been used, but the one most successful and most widely supported is SCSI. This is a subsystem, not simply an interface. The controller board contains not only an interface to the motherboard but also a complete intelligent controller that can carry on business with the drives with no intervention. The drives need to be smart too, and they have their own controllers built in as well. When a request from the operating system comes to the SCSI controller for an attached drive, the controller will contact the drive, send it commands, monitor the data transfer, and handle ending status with little or not help from the system. When the transfer is completed, the SCSI controller will simply advise the system that it is done. Other types of devices can be SCSI besides disk drives, including tape drives, printers, video display adapters, and network adapters. There are several versions of SCSI, including a fast/wide version, that can handle 20 megabytes of data transfer per second. Unfortunately, there is a variety of non-standard products in the market, and there is no guarantee that brand x drive will work with brand y SCSI adapter, even if they say so.

32

Data Communication

Data Communication has also been called Telecommunication and Teleprocessing. The essential idea is that information (data) is sent over long distances between systems through a hostile, non-digital medium. The classic example is using a modem to communicate between a terminal and a mainframe over telephone lines. The reality has now expanded to a point where the communication is likely to be digital as well as analog, and the distances can be a few feet to around the world.

In the beginning, there was the mainframe and the terminal. The mainframe was the center of the universe, and did all the work. When video display devices and automated typewriters became affordable, attaching these to the mainframe within the same building became the preferred method of data entry and interaction with the frame. Initially, these were dumb terminals, which had no ability to do anything other than display letters and numbers on the screen. They had no processor, and could not compute anything. The had no memory other than that needed to keep the characters on the screen. The display was rudimentary, with little or no variation other than plan text.

As technology advanced, and the microprocessor became available, smart terminals appeared, both as printing and video display devices. These could remember to give certain characters on their displays attributes, such as blinking, reverse video, automatic underlining, etc. They could protect certain fields on the screen and unprotect others, thereby allowing the filling in of blanks on a form without destroying the labels for the blanks unintentionally. The video displays could have an attached printer and could be directed to feed data to it under control of the mainframe.

Just before the advent of the personal computer, a variety of intelligent terminals began to appear. With or without microprocessors, these machines were able to carry on their own business locally, and needed to communicate with the mainframe only occasionally. These first appeared as automated cash registers in department stores, followed by super markets. With the advent of the PC, the ultimate intelligent terminal appeared that could do everything for itself, and contact other systems only to share data or results.

Using these terminals and the still present mainframe, a variety of methods sprang up. Remote Job Entry was an early attempt to allow a computer at a remote location to do your local work. Card readers and printers with communication controllers could feed a deck of punched cards to the remote system over the phone lines, then print the result. The cost of the phone lines and the machine were still much less than the cost of another mainframe. Timesharing was a similar technique in which terminals either remotely or at the system site got little slices of a computer's time (time slicing) and shared the system with others. Each user, the average human being much slower than the average mainframe, felt as if the system was dealing just with him/her. However, the computer was dealing with many users at once, and taking advantage of the techniques of multiprogramming, all the users could be helped.

As technology has progressed further yet, we have had two approaches spring up that take advantage of the ideas of data communication and intelligent terminals. Toward the end of the 1970's, the term distributed processing became popular. As people saw that the microprocessor and intelligent terminals were taking on more and more of the work formerly done by mainframes, the idea was put forth that we didn't need a mainframe at all, but rather a network of highly intelligent terminals that could do their own local work and share data as needed. An example would be the LA Community College District that at one time wanted to be minicomputers at each campus and have no mainframe at the central office. The processing needed by the district as a whole would be distributed to the campuses and a sort of parallelism would result.

33

The current thinking along these lines is based on the fact that distributed processing may or may not have worked, depending on a large number of variables. In many cases, it was found that although PCs and intelligent terminals could do a lot of the work, there were some things that the mainframe could do better. An example is a centralized archiving function. The student records for the school district is an enormous collection that simply can't be split up among campuses without losses in speed and accuracy. So the idea is to let the PCs do the work of interfacing with the users, which they do well, and let the mainframe do the archiving and record keeping. The two can communicate whenever necessary. This is called client-server computing.

With the advent of the PC in more and more homes and offices, there have been a number of subscriber services that have appeared. These are simply large systems with extensive communication ability that can connect to hundreds of dial-in users at once, and can also communicate with other systems that have specific services to offer. Examples include Prodigy, CompuServe, America OnLine, etc. All of these offer access to services such as airline reservations, stock market quotations, purchasing services, etc. One of the most popular services is electronic mail, or Email. This is an outgrowth of several different services that allows you to send and receive messages from others at remote location, similar to a typed letter or telegram.

The Internet is best defined as a "network of networks", where initially university campuses and Department of Defense offices, each having their own networks, connected together to share data and email. Only recently, in the last three years, have the commercial interests taken over and tried to make money with it. Originally, it was designed as an open, free environment where a "hacker" was a good thing to be, and where a great deal of experimentation by college students at all hours developed a lot of the technology and software we now use. The use of the World Wide Web (WWW) as a resource for research and communication has been overshadowed by entertainment and unnecessary traffic.

Data Communication Hardware

A complete examination of the current state of data communication technology could take whole semesters in itself. However, as an introduction, here are some essential facts. Keep in mind that the original way of dealing with the technology of data transfer between computers was via the telephone line. The telephone system, particularly the one in the typical home or office, may be fine for humans, but it is an exceptionally hostile environment for digital to use as a path. It is an analog world, and until recently, it was necessary to take a digital data stream and convert it analog signals, pass it from point A to point B as such, and then convert the analog signal back into digital data at the receiving end. This is still how it is done for the average user. The device used to do this conversion is called a MODEM, which stands for modulator-demodulator, with the modulation being the conversion from digital to analog, and the demodulation being that from analog back to digital again.

LINE CLASS: Line class defines how well a particular transmission line can carry certain types of signals. Voice grade lines are the class that you would have on a home telephone, and they can allow two people to speak to each other without noise or confusion. Sub-voice grade lines have so much static and interference that people would quickly hang up. These can still be used for some types of telegraphy. Leased lines have better than voice-grade ability and can handle data in thousands of characters per second.

BAUD RATE & BITS PER SECOND: The idea of baud rate was put forth by George Marie Baudot, an officer in the French Signal Corp, who was interested in the automation of telegraphy. He defined the unit called the baud as the reciprocal of the transmitter's frequency. It is therefore the

34

theoretical maximum number of binary bits that can be sent in one second from a given device. Bits Per Second, however, is the actual number of bits sent in a given second. If the baud rate of a machine is 2400, but no data was sent for a second, the baud rate would still be 2400 and the BPS rating would be 0.

HALF & FULL DUPLEX: Duplexing has to do with how many data streams are open between two systems at a time. Half duplex transmission states that data can go either way, but only in one direction at a time. Full duplex transmission states that data can go both ways at the same time.

MICROWAVE & SATELLITES: These are methods of sending large amounts of data over long distances, both of which are expensive and no longer considered except for certain purpose.

FRONT END PROCESSOR: A Front-End Processor (FEP) is a specialized computer that is designed to operate between a mainframe and a group of dial-in circuits. The mainframe is fast and deals with large amounts of data at a time, while dial-in terminals are much slower and deal with data on a character-by-character basis. The FEP does the work of speed and protocol conversion so that the mainframe doesn't spend time on trivia.

Local Area Networks

Local Area Networks (LANs) have become a major player in the business in the last ten years. As the idea of the PC caught on, and as more products and technical advances appeared, it became obvious that it should be possible to connect PCs together so that they could share data and printers. This was the first approach, the sharing of files between systems and printers that were expensive.

The first commercial approaches were based on the idea of peer-to-peer networking. This implied that all the PCs connected together were of equal capability and importance. All had fixed disks, all had sufficient processing power, etc. Each peer would make available, or share, the resources of its machine that it would allow others to see, and make private files and other items that were secret to the one system. When a peer wanted a file that was located on different system, it would make a request for it. If the station where the file was located had marked it as sharable, the requester was granted access to it. A more recent version of this is Windows for Workgroups.

Now, the standard approach is to have one or more large systems attached to a bunch of smaller ones, with the larger one being the reservoir, or storage bucket, for all the shared programs and important files. This machine is called a server. It might also have special printing or communication hardware as well, that could be shared on demand.

There are several types of data conductor in current use in LANs. The original was called coaxial cable, in which an inner wire is surrounded by insulation, then by an outer shield or conductor, which in turn is coated with an outer insulation. The two conductors are "co-axial", that is, they share a common center. Coax has been used for many years for radio and television transmission, and can be used for data, although it is subject to many problems based on the laws of physics dealing with transmission lines (reflected and standing waves, etc.).

Another type of wiring is called Shielded Twisted Pair (STP). Twisted pair is simply two wires twisted together over a long distance, which keeps them together and provides a certain small amount of noise cancellation. STP is a heavy duty version of this, where the wires are shielded by a braid or heavy foil. This protection does two things. First, it protects the signals in the cable from being

35

affected by outside interference. Second, it gives the line a characteristic impedance as a means of dealing with the transmission line laws of physics.

Unshielded Twisted Pair (UTP) is an attempt to reduce the cost of STP and was originally intended to make use of the telephone lines strung through commercial buildings. This is a hostile environment and so the term UTP can mean anything from junk telephone wire to expensive teflon-coated cable.

Coax, UTP, STP, and any type of conductor that uses metal and passes electrons is subject to the laws of physics for transmission lines and induction. Current passing in a wire can induce interference into adjacent cable. Outside interference, such as spark plug noise, motors, and lightning can induce interference into the cable. Ideally there should be a method of data transfer that is not subject to these problems.

Fiber optic cable is subject to none of the problems of induction. It does not generate interference, nor is it affected by it. It is far faster than electronic conductors. This is because the signal is conducted by light, rather than electron flow. Electrons, you will recall, are matter - they take space and have weight. They have magnetic poles that can interact with other magnetic materials nearby. Light, on the other hand, is not matter, but pure energy levels. This is similar to the energy concept of magnetic lines of force. But since there is no matter moving in light, and since it does not have any electromagnetic polarity, there are no problems with interaction between a light beam and other conductors. In fact, you can shine light in opposite directions down a fiber and the two will not interfere with each other.

The fiber itself is a strand of flexible glass that is coated with an outer layer of harder glass. The point where the two glasses meet, along the length of the cable, forms a reflective surface. Light coming down the softer fiber will attempt to leak out, but be returned into the fiber by the reflection. If coherent light is used (light of one frequency, such as from a LASER), the losses are very small. So the light can travel over long distances with little deterioration in signal. The unit of light discussed in fiber optics is called the photon. Photons travel much faster than electrons do. Therefore, the speed of data transfer can be much faster over longer distances with less losses than with electronic means.

The topology of a LAN is the road map that the information follows as it makes its way around the system. Data are sent between computers in packets, which are fixed-length groupings of bytes that have an addressing scheme and error checking built-in. Common topologies include the star, where all the stations are wired to a central point and the cabling radiates outward from that point in a start pattern; bus, a general term that describes a central long path with side paths attached to it along the way; ring, in which the data path is a giant circle, and the packets pass through all the stations as they go around. The term backbone is used to describe a high-speed data path that distributes the signals to local areas. It is not relegated to a particular type of network.

Two types of networks are very common today: The first is called ethernet, and it was proposed in the late 1960's by the Xerox Corp. and others. It is characterized by using coaxial cable as its main transmission medium, both in a "thick" version that looks like water pipe, but is low loss, and a "thin" version that is for short distances. Because it uses coaxial cable, it is limited to a 10 megahertz signaling rate, and is subject to the physics limitations of such. There are a variety of expansions of this type of network, using multiple conductive paths, etc., to get more speed.

36

The token ring technique was proposed by IBM and has been adopted by NASA for the space station. This is a ring topology, and the top speed is 16 megahertz. It uses UTP or STP, but no coax. The CITYnet system at LACC is of this type.

Fiber Distributed Digital Interface (FDDI) is an outgrowth of the token ring. It assumes that the data path is 100% fiber, with no copper conductors involved. As such, it has a 100 megahertz signaling rate. It is essentially a token ring topology, and similar to token ring in operation.

Asynchronous Transfer Method (ATM) is the state of the art in data transmission and is aimed at users who need to send data at high speed to and from a large number of users at the same time. It uses multiplexers and exotic packets to merge a large number of users into a single data stream, transmit the stream, then break the users' streams apart at the receiving end. It is supported by IBM and others.

The word protocol is very important in networking. A protocol is an establish way of doing something. There are a great many protocols in data communication, each addressing some particular aspect of sending bytes somewhere. A lot of these are historical, or replace old protocols with update versions. Many are the result of promotion by a particular company that works only with their equipment. Here are three of the most important to PC networks.

LAN Manager/LAN Server: These are a set of software server packages that support protocols originally proposed by Microsoft and IBM before they got their divorce. The protocols include NETBIOS, an API that allowed programmers to access the network easily, a later version called NETBEUI, and finally NDIS, which is the current method of data transfer in an IBM mainframe -based network.

IPX: This protocol is part of the Novell Netware suite of softwares for servers, and was their own approach at trying to be the big canary. Companies that have large Novell installed basis use it as their standard procedure. However, the netware methods and those of IBM/Microsoft were diametrically opposed and a network with both is difficult at best to manage.

TCP/IP (Transmission Control Protocol / Internet Protocol): This series of protocols was developed long ago when the internet was first developed under the Department of Defense. It is a simple protocol compared to others, and has been accepted as an industry standard. It does not support a lot of the flashy methods of the other systems, although a great deal of development has been done to accommodate graphics, sound, etc. While each company in the business has tried to make its protocols the world standard, none have succeeded, and TCP/IP still reigns as the only standard supported by everyone.

37

SECTION 4 - Software

Which is it?

This course can only begin to offer the ideas and practices involved in the creation and operation of programs and software that have evolved since computers first began. There are entire sciences that have built up around even small ideas that combine with others to form an enormous body of knowledge, skills, and tools with which programs may be created. As a beginning, it is convenient to divide software into two areas, system software and applications software. System software includes programs and related material that is essential to the function of the computer itself. It deals with the computer at the hardware level, and contains programs and routines that enable the computer to function, to communicate with the I/O devices, and to communicate with the user. It provides a platform or base of software support upon which the applications programs can operate, making available the resources, storage, I/O, and hardware of the system to the user's needs. It is generally referred to as the operating system.

Application software consists of those programs and material that are designed to provide the user with some particular type of service. Examples include word processing, accounting, drawing, communicating, etc. Application programs are designed to deal with the user of the system, that is, interact with the user via input and output devices such as keyboards, mice, screens, and printers. When an application program needs the assistance of an I/O device or system resource, it asks for help from the operating system. The operating system will carry out the request, such as reading or writing from/to disk, displaying something on the screen, or sending something to a printer. If data has been entered into the system as a result of the request, the data is forwarded by the operating system to the application to allow it to continue processing.

The interface between the operating system and the application software is the topic of much development and contention. Generally, the operating system provides a common method by which the application can ask for help. The MSDOS "Interrupt 21h" is a classic example. In this case, the application sets up values in certain registers of the computer and in certain memory areas, then executes and INT21h instruction. The operating system responds to the interrupt, inspects the registers, and acts accordingly. When the requested process is finished, the operating system returns control of the computer back to the application. As operating systems became more complex and the use of the Graphical User Interface (GUI) evolved, different such interfacing techniques were developed. There now exists a number of method sets by which the application communicates with the operating system, and the one used depends largely upon which operating system is in use in the machine. These interfaces are called Application Program Interfaces (API).

THE OPERATING SYSTEM

The operating system can be roughly divided into three parts, the Kernel, the I/O Control System (IOCS), and the Shell.

The Kernel is the heart of the system and provides the brains and the personality of the software environment. The kernel manages the passage of program instructions to the processor, communicates with the I/O system, sends and receives information to/from the shell, and determines the overall operation of the computer. In those systems where multiprogramming or multitasking is supported, the kernel makes the decisions about which request for processing time is allowed first or last, and how long the time slice will be for the selected process. The overall nature of how the

38

system functions, how you write programs to run on it, and its general personality are determined by the kernel.

The Input Output Control System (IOCS) provides the interface between the kernel and the computer's hardware, particularly the I/O devices. The IOCS can be divided into two parts, the physical IOCS and the logical IOCS. The physical IOCS is responsible for communicating with the electronics of the computer and its I/O devices. It deals with sending data to a device or receiving data from it, checking to see if a device is available for transmission, waiting on a device if it is not ready, and a myriad of error checking and reporting techniques. In the PC, the programming contained in the ROM BIOS is the classic example of low-level code that deals directly with the hardware.

The logical IOCS is involved with data blocking and unblocking, high level error checking, and direction of data flow through the computer. Data blocking involves the conversion of data from an organization that makes sense to the programmer to one that is demanded by the hardware. For instance, in MSDOS, the programmer may decide that 70 characters is sufficient for one data storage record. However, the disk drive deals in 512-byte sectors. As the program executes and data is taken from the user via the keyboard and sent to the disk, the logical IOCS portion of MSDOS converts the 70-byte records in to groups of 512 bytes so that they will fit into a disk sector; this is the process of blocking. Since 70-byte records will not fit easily into a 512-byte group, the point at which one record must be divided and carried into a second 512-byte block is dealt with by the logical IOCS. When reading information from a disk 70 bytes at a time, the data unblocking must occur to make the 70-byte records whole as they are taken from the 512-byte sectors.

The Shell is the user interface for the operating system. It is easy to see in MSDOS, since the user at the keyboard, who is watching the screen, is interacting with the system via the shell. MSDOS provides a default shell called COMMAND.COM. This is the module that provides interaction with the user via keyboard and screen at the command-line level. When a user enters a command such as COPY at the "C:\" prompt, the prompt, the motion of the cursor, the appearance of the letters C-O-P-Y on the screen and the subsequent function are all provided by the shell. It knows how to interact with the kernel, and will direct the appropriate requests to the kernel based on the entry of the user.

While COMMAND.COM is the usual MSDOS shell, it is possible to replace it with another of your own creation. A common example of a replacement shell is the Graphical User Interface (GUI) of Windows on a PC. Here, the command line system has been replaced with a graphical representation of functions and programs through the use of icons, little pictures that represent a program or function. The entire screen of the display is treated as a graphical entity. The placement of multiple windows, or display boxes for programs and functions, and the execution of the programs themselves treat the display in a graphical (pixel by pixel) rather than a text (character by character) manner.

Operating System Characteristics

The development of the operating system over the years has followed that of the computer hardware. As the hardware became faster and was given more ability to do work, the operating system followed suit. In some cases, the operating system was modified to implement a particular feature, and, when complete, the hardware was modified to implement the feature more easily. Some of these features follow.

Multiprogramming is the technique of allowing what seems to be more than one program to execute at a time in a system. However, if the processor has only one ALU, then only one instruction can be

39

executing at a time, in a traditional system. Therefore, the system gives small amounts of time to each of the several programs that are running at the same time (timeslicing), and the appearance is of several things happening at once, even though only one is truly happening at any instance.

Multiprocessing implies that the computer hardware actually has two or more ALUs, that is, that it can indeed carry on two independent processing streams at the same time. This is also called parallel processing. Large mainframes were designed with two processors sharing common memory and I/O, but carrying on independently most of the time. This technique has been expanded greatly, and is used in current microprocessors where there are several types of process going on simultaneously.

Multitasking is a method be which one system and one user can accomplish multiple tasks in a circular fashion by multiprogramming and time slicing. What makes multitasking different is that it implies one user, whereas the traditional multiprogramming with timesharing assumed multiple users.

Real Time Systems are those that deal with program execution in step with the actual elapse of time in seconds or microseconds. In a traditional batch system, a program execution was started, and the time it took to run depended on the bulk of data to be processed and the size of the program in general. In real-time systems, however, the idea is to keep track of the actual elapse of time and interlock the execution of the program with it. This is used in industrial and control applications, where a valve needs to be open for only five seconds, a motor needs to speed up in ten seconds.

Embedded Systems are those where the hardware and/or software are placed inside a large mechanism, or are designed to work independently at a remote distance from their control point. These are common in robotics, industrial control, and environmental control systems.

Virtual memory is memory that does not exist, but can be used anyway. You can put data into it, keep it there, and get it back when you need it. However, there are no memory chips involved. OK, so where is the data stored? It is stored on disk. OK, so why the big deal - why not just call it disk storage? The answer is not in the physical way data is stored, i.e. the disk drive, but in the logical way that the data appears to be stored by the program.

Virtual memory requires both special hardware that allows addressing information to be converted to disk drive locations on the fly, along with an operating system that monitors the need for this conversion and causes it to occur. The advantage is that the programmer writes as if the memory available to him/her was virtually infinite in size. The programmer is not hampered by a small real memory in the machine. In the old days, if the programmer needed to process more data than would fit into the system's real memory, the saving of data onto disk and the retrieving of data off the disk was up to him/her. There were no provisions in the hardware or software to ease the situation. Now, however, if a programmer needs to store 8 megabytes of data on a machine with only four megabytes of memory installed, the virtualizer of the system will take over and allow the 8 megabytes appear to be stored, even though only a small amount of the data may be in memory at any one instant. This is a major service to programmers working in large systems.

Swapping of data to and from disk comes into play in multiprogramming or virtual memory systems were the demands for storage made by the current programs exceeds the available memory. The real system memory is divided into "page frames", usually 4 kilobytes in size. These are sometimes referred to as swap blocks. The system will look at the available memory, and find the one that has been least recently used (the LRU rule). This block of 4K is swapped to disk in a special area, and the frame is then made available to the needing program. If the data in the page to be swapped has not

40

changed since it was brought into the memory from disk, the "dirty bit" is clean (no change was made), and the new data coming in can be set over the old directly. If the dirty bit is on, indicating that data had changed in that frame, the data is stored first, or swapped out to the disk.

It is possible that a system can have so little memory and so many requests for service that it spends all its time swapping blocks and no time processing. Nothing is accomplished, even though the system is working very hard. This is called thrashing. An example of swapping and thrashing can be seen in Windows, where the machine has a small RAM memory and a large program is started. The system becomes very slow, with the drive light on all the time.

Examples

MicroSoft Disk Operating System (MSDOS) is an operating system that is very common and well known in the small computer area. It provides sufficient operating flexibility to support most commonly needed application programs, and indeed has a lot of features that have been little used. It works best in smaller systems, and where character-based use is the norm. PCDOS from IBM is one OEM's version of MSDOS, and has been modified at the lower levels to fit the needs of the IBM product line. Today, MS-DOS is used primarily in embedded and development systems that do not need a graphical user interface.

Windows 3 is a shell that rides on top of MSDOS, and provides a GUI via the mouse and screen to allow graphically oriented programs to be used. The question of whether the program needs to be graphical or not is left to the reader. Although the idea was to allow the user to have more than one program active at a time, that is to dummy up what would appear to be multitasking, the result is that a user can interface with only one program at a time even though they may be able to see more than one window at once. When wishing to start using a visible program other than the current one, the user selects the new program with the mouse. This causes the window of the new program to come forward and the current program window to recede. There is a great deal more to this, however, as such an action causes a "context switch". All the data of the current program must be taken out of the lower 640K of RAM and stored either in high memory or onto disk (swap blocks). The data and code for the new program must then be loaded from disk or high RAM into the bottom 640K of the system before it can take over. This swap can take many seconds in some cases.

Windows NT and OS/2 began as the same product in a cooperative effort between MicroSoft and IBM when the PC/AT was introduced in 1984. They cooperated in its development as OS/2 until a falling out occurred between the companies. IBM carried on the project with OS/2 version 2, while MicroSoft did things somewhat differently as Windows NT. The two operating systems work very much the same. Both systems take advantage of the Intel 80386 processor design and its derivatives that support multitasking, virtual memory, the use of protected mode and virtual real mode (486, the Pentiums, etc.). Both work very well as network server platforms. Because of the preponderance of Windows in desktop business applications, OS/2 has pretty well vanished from the scene except in IBM mainframe installations.

Windows 95 is a MicroSoft product that is intended to replace Windows 3 in the home and small business environments. It makes use of the hardware of the '386 to support multitasking, unlike Windows 3 that was essentially a single-tasking device. It has extended the file system of Windows 3 to accommodate long file names, but this is a kluge, not a better way, and may not be a good idea in the long run.

41

Windows 98 is an upgrade of Windows 95 and provided a more reliable platform for programming. Windows 2000 and XP are later generations of Windows NT, with extensive additions to implement server requirements. Windows XP is supposedly all 32-bit, that is, there is nothing left of the original 16-bit programming that was typical of earlier Windows systems. UNIX is an operating system first developed at the Bell Labs of AT&T as an in-house product. It has been widely used in universities and engineering and scientific circles, and is the primary system found on the internet. It is user-unfriendly, and has a steep learning curve for the new user. It is used in heavy graphics environments such as CAD and engineering. Most servers on the world-wide web are UNIX based, since all of the Internet protocols were developed on UNIX systems. AIX is the IBM version of UNIX.

LINUX is an open-source, free UNIX clone that was developed and runs on Intel systems. The open-source initiative is extensively supported world-wide, and this is the primary operating system to support these efforts.

OS/X Operating System 10 is the latest Apple Macintosh operating system. It is based on BSD UNIX and is a major diversion from the earlier Macintosh' systems. It is the first Mac system to have a true command line interface in addition to the usual Mac GUI.

VMS is the general name of the systems used in the Digital Equipment Corporation VAX series of computers. VM/IS is one of the systems used in IBM mainframes. Both of these provide multiprogramming, multitasking, timesharing and a wide variety of development tools.

Solaris Solaris is the UNIX operating system for computers built by Sun Microsystems. Sun servers run most of the web servers in the world in larger commercial installations. The latest version of Solaris, Solaris 9, is extremely secure and runs on both Sun and Intel systems.

APPLICATIONS SOFTWARE

Applications software and programs are those that provide the average user with some sort of desired result. These can be anything from accounting reports, business letters, graphical designs, and engineering drawings to data exchange between offices and information lookup. In all cases, the expectation is that 1.) the user will be directly interacting with the software, and 2.) the software will need the help of the operating system beneath it to get the job done.

Applications programs can be divided into two broad areas, horizontal and vertical. Horizontal applications are those that can be used by a wide variety of people and businesses; they are not business-specific. Word Processing is the classic example. Word processing for the preparation of printed documents can be used to generate business letters and forms, school homework, personal letters, documentation and record keeping, etc. So, a horizontal application is of a general nature, with no particular user in mind.

Vertical applications are used only by certain people or types of businesses. A billing system for a doctor's office is a good example. This system can keep track of visits, patient histories, medical procedures, and anything else of a record-keeping nature that would be common to that location. It would not be of benefit, however, to an engineer who wishes to put together cost estimates for a building.

A question that regularly appears in user's minds is whether to purchase programs that are already written, or to write them yourself or have them written to your specifications. Packaged programs have several advantages, such as already having been debugged, adhering to standards, using

42

conventional file formats, and having technical support available from someone that hopefully knows a lot about it. It would be unwise to write yet another word processor, for example, since there are many already available that have universal acceptance, work well, and are all ready to go. No need to reinvent the wheel.

Custom programs are those that are written by the user or the user's contractee to certain specifications. Typically, these would not have universal appeal. They might be used only in certain places or conditions, or to record data or solve problems that belong to a certain job or need only. Examples include data management for a particular kind of business (Federal Express), a certain highly specialized need (Space Shuttle), or because the user wishes to learn how to program or to arrange data certain way.

Examples

Examples of application programs include:

• Word processors that allow the organized writing and printing of documents, letters, etc., (Word Perfect, Word)

• Desktop publishers, which allow word processing to take advantage of page layout including graphics, word art, and embedded pictures (PageMaker, Framemaker, etc.)

• Spreadsheets, which allow numbers to be related to each other in an X-Y matrix, usually used in accounting, but usable in many other areas including engineering and sciences (Lotus, Excel)

• Drawing programs that allow graphic arts to be created using the computer as easel and palette (Corel, Micrographix Designer)

• CAD (Computer Aided Design) programs that permit more accurate graphics to be created, including scaling and 2D and 3D presentation, and image rotation (AutoCad)

• Simulation programs that allow engineers to construct circuits or processes on screen and that then execute a simulated version of the processo or circuit (Multisim)

The Art of Programming

Of course it is debatable as to whether programming is a true art, but it can be shown over and over that some people can develop programs quickly and elegantly, that run smoothly with a minimum of system resources, while others end up with clunky messes that never work quite right.

Suppose that a company wishes to replace an old computer and accounting system with new equipment and an accounting package that can be expanded as the company grows. The proper procedure to accomplish this would include

1. An analysis of the way the company does accounting now, followed by an analysis of how it can be improved and made extensible;

2. A planning or laying out of the big steps needed to implement the new ways, using block diagrams and "big picture" methods;

3. A generation of a series of detailed plans that define the specific steps needed to implement the block diagram blocks;

4. The writing of program code in a selected language that implements the details of step 3; 5. Trying the program modules out, one at a time, and correcting problems; 6. Integrating the modules and correcting problems; 7. Putting the programs into place, running along with the old system to compare results;

43

8. Upgrading and expansion as needed.

Each of these steps may be done by different people, depending on the size of the company and of the job. The first person involved is the system analyst, who is responsible for steps 1 and 2, and oversees much of the rest. The analyst is a specialist in one or more areas, in this case corporate accounting, and who is also skilled in the use of computers to solve problems. The first requirement is the more important. The analyst is an accountant first, and a computer wizard second, not the other way around. It is possible that the analyst may never see the computer at all. Most of his/her work is done at a desk with pencil and paper. The result of the analysis is called a system flowchart.

The system flowchart is then broken down into a series of flowcharts that define the steps needed to implement each of the blocks in the system flowchart. These are called program or detail flowcharts. Again depending on the size of the project, these may be created by the analyst, or by a senior programmer or other staff member. In this step, everything that the computer must do to implement the new system must be logically explained, and the order of execution specified.

The programmer then will convert the detail flowcharts to program language. The language to be used depends on the nature of the project and hardware, the language already preferred and in use in the company, the nature of the job, etc. There are many levels of programmer, from trainee or associate to senior. Generally, the rank is based on years of experience. However, it is the programmers' job to create the original code in the language selected to implement the new system. This is called source code.

Depending on the circumstances, a computer operator may be involved in converting the source code to machine -usable form. The new program is run, and will almost certainly not work. The programmer, operator, and analyst now go into a circular sequence of write, test, crash, look over, until the product eventually works to satisfaction. This is called debugging, that is, getting bugs out of the program.

Programmers will also be involved keeping the programs up to date after the initial product is put into operation. If smart, the people involved will run both the new system and the old in parallel for a while and check to see that the results are the same between them. Eventually the new system will be declared golden, and the old system discontinued. As time elapses, changes will be necessary to accommodate new business practices and conditions. Modifying a program after it is in place to keep up with such minor changes is called maintenance.

The actual preparation of a program for the system includes the writing of the source code in the selected language, the conversion from source code to object code, further conversion to executable code through linking, and trial on the system as an execution.

Taking the detail flowchart as a guide, the programmer writes lines of code in the selected language either into a terminal or PC, or on paper. The lines are captured in ASCII, and must adhere to a set of standards and procedures as defined by that language. These are called the syntax and grammar of the language, and are similar to the syntax and grammar of any language. The position of special characters, spelling of certain words and organization of characters in the lines are all spelled out in the language in use.

The above discussion illustrates top-down design. The idea is that the problem of a new accounting system is approached from the top and detailed downward to the smallest details at the bottom. As the analyst designs the system and works with the senior programmers to implement it on a particular

44

computer with a particular language, they may elect to use structured programming as a method. Structured programming is simply a method of writing code such that every process that depends on another, smaller process, expects the smaller process to complete first before continuing. Certain languages are particularly suited to this approach. It may also be decided to implement object-oriented programming, which takes advantage of language characteristics to build logically complete blocks called objects that include all their own code and data, and can take on a life of their own within the system.

Languages

There is a hierarchy of computer languages that have developed over the years as technology has improved. Prior to 1964, commercial computers were designed such that the electronic level of logic was the lowest level available. This was called the "red light" level, and it was the point where the electronics via the front panel's red lights and switches were visible and accessible. When interacting with the machine at this level, such as a technician testing the system by entering bits into the switches, it was said that we were interacting at the machine language level. This is the level at which the circuits of the system functioned and implemented the logic.

Each machine language instruction consisted of two parts. The first was the operation code or OP Code, which was the action word that explained what was going to happen to the data. These were typically ideas like "add", "subtract", and "move". The second part was called (incorrectly) the operands. These could include the data itself that would be acted upon, or assist in finding the data in the system, such as by specifying a register or a memory location. Usually, the operands specified where the data was coming from for the operation, and/or where the data was going to when the operation was completed. These are referred to as sources and destinations.

In 1964, IBM introduced the System/360, and with it the idea of the microprogram. This approach treated the registers and data paths of the machine as sources and destinations, much as the machine language had done. However, instead of static electronic circuits built to implement every possible combination of source and destination, the ALU and other system parts were treated dynamically following little instructions or steps in the microprogram. The machine language instruction "add" could be implemented by several microinstructions that caused the data to move through the system dataflow such as to accomplish the effect of an add. So, therefore we have a level of programming below that of machine language, called microprogramming. While clever operators and programmers could deal with the system at the red light level, only technicians are involved with microprogramming.

Above the machine language level we are into symbolic languages. These use ASCII words or character groups, called "symbols", to represent OP Codes, registers, locations in memory, and I/O devices. Some of the character groups are already fixed by the language, and are called reserved words. The idea of a symbolic OP Code is called a mnemonic. It is a group of letters that make it easier to remember the function represented, instead of a bunch of 1's and 0's. Examples include ADD, SUB, MOVE, etc. The operands can also be given names by the programmer. If we use a mnemonic for an OP Code, a reserved word for a register, and assign a name of our own for a data location in main memory, we might get an instruction like

MOV R3,WIDGET

45

Three types of symbolic languages have evolved. The first was called assembler language. The language and its grammar and syntax are very closely determined by the machine on which it is to run. It's characteristics are:

1. Product specific - assembler languages written for one maker's machine will not work on the product of another company.

2. One line of code equals one instruction. Translation is on a one-to-one basis. The logic of the instructions is simple, and limited to simple single events.

3. Complex operations may be provided by the language, or need to be created locally in a macro.

Assembler is easy to learn (sort of) and is the basis for most operating systems and system-level programming. Its results usually run faster than other types, and makes small code. It can be very tedious to build large products with it.

The second type of language is referred to as compiler language. In this case, the symbolics are removed from the hardware of the machine. It's characteristics are:

1. Non-product specific - FORTRAN 90 will run almost without modification on any system that supports it.

2. One line of code can create large quantities of instructions. The logic can be complicated and multiple.

3. Complex functions are usually built-in, and can be used frequently throughout the code.

Compiler languages take a lot of computer resource in disk and memory. The compilation process can be lengthy. The results can be difficult to debug. The results can be elegant and effective.

The third type of language is called interpretive. In this case, the lines of code are translated one at a time, and immediately executed. This resembles compiler languages, however, the execution of a single line immediately following translation requires certain coding practices and adjustments. The downside of the interpretive languages is that they execute slowly, and are not very good for final production software. The upside is that since a programmer can make a change on one line and then immediately test it, development time can be minimized.

Once a program written in an assembler or compiler language has been written into its source form, a file exists that contains the lines of ASCII characters in the proper grammar and syntax. In the case of the assembler language, a second program called the Assembler is used to translate the ASCII source to binary object code. This may take one or two passes through the source code (one- or two- pass assemblers). The object code that results is the binary equivalent of the source, but with certain missing parts.

In the case of a program written in compiler language, the same process is followed. The source code is given to a program called a compiler, which converts the source statements into blocks of binary coding. This is more complicated than with assemblers, because the compiler languages allow more complex statements and logic and are not restricted by the hardware so much. The resulting object code, like that of the assembler, has missing parts.

The "missing parts" of the object code files are the result of the fact that the programming methods used today make allowances for the fact that a complicated programming project must assume that the code is written in parts, by different people, at different times, in different places. Bringing these

46

parts together into a common whole requires allowances be made for how the final product will be fit together. Most important here is the allocation of main memory. Since the author of one module may not have the details of the modules written by others, it is necessary to allow all of the modules to be fit together all at once. This requires the use of a program called a linker, which links the object code modules from the assembler or compilers into a final executable file. This file contains everything needed for the program to execute in the operating system environment provided, including how to find memory references made by various programmers in different places.

Examples:

• MASM (Macro Assembler) - The assembler language from MicroSoft for the PC. • TASM (Turbo Assembler) - The assembler language from Borland Intl. for the PC. • FORTRAN (FORmula TRANslator) - The first compiler language for computers derived by

IBM in the 1950's and still in use for scientific and computational work. • COBOL (Common Business Oriented Language) - A compiler language for business data

processing and accounting. • BASIC (Beginner's All-Purpose Symbolic Instruction Code) - Created at Dartmouth as a

means of teaching the use of computers to non-computer students, e.g. students in Social Sciences. An interpretive language, it has been widely used in microprocessor-based machines. It encourages bad programming practices, however, and has been largely abandoned.

• PASCAL (after Blaise Pascal) - A language developed in the 70's in Switzerland to assist teaching computer logic to beginning computer science students. It was brought to the US by University of California at San Diego, and issued by them as an interpretive language with an intermediate code that was also accessible, called pseudocode or p-code. It has replaced BASIC as the initial programming language for colleges and universities. It is available as both a compiler and interpretive language.

• ADA (after Ada Augusta Byron) - A language designed and specified by the US Department of Defense as a common standard for military projects. It was written in PASCAL.

• RPG (Report Program Generator) - A simple small-system accounting and business language. • C - Created by Bell Labs of AT&T as a language to support development in UNIX. It is a

compiler language, and has become the language of choice for systems development in the microprocessor-based systems, or where UNIX is involved. It is an object-oriented system, heavily used in GUI development.

• C++ - A superset of C, this language adds extensive object-oriented functions and support. • SQL (Structured Query Language) - A 4GL approach to creating database queries via a

common language idiom, into which other programs can link if the SQL environment is supported.

• APL (A Programming Language) - An interpretive language that uses a variety of special symbols to implement primitive mathematical or logical functions. It allows complex relationships to be defined with a few characters, and the program interpreting these can execute complicated statements quickly.

• JAVA - A partially compiled language that is supposed to be "platform independent". Created by Sun Microsystems, it is designed to allow intelligent mini-applications, or applets, to run under the supervision of an internet browser such as Netscape.

• PERL - An interpretive language heavily used in web servers to implement CGI, the technique of allowing a user to interact with a web page. It is primarily a text-processing language, but can do math, network communciation, and disk processing as well. Heavily used in UNIX and LINUX systems.

47

Last but not least, don't forget that the job isn't done until the paperwork is finished. This means that it is essential to document your work with full comments in the source code, and detailed procedures written down for its use. Do not assume that a programming language is "self documenting". There is no such a thing, and nothing better than an accurate document to explain how a system works.

Documents

SECTION 1 - Computing Today Historical Thoughtsfazliyildirim.com/pdf/note1.pdfSECTION 1 - Computing Today Historical Thoughts While the activity of counting objects and remembering