Upload
taner-erkan
View
336
Download
1
Tags:
Embed Size (px)
Citation preview
Lecture № 1
Introduction to the Computer Organization and Architecture. 1. Notions of the Computer Organization and Architecture.
2. Functions of the Computer System (CS):
Data processing;
Data storage;
Data movement;
Control.
3. Structure of the CS (hierarchy levels).
4. Multilevel computer organization.
Literature.
1. Stallings W. Computer Organization and Architecture. Designing and performance, 5th ed. – Upper
Saddle River, NJ : Prentice Hall, 2002.
2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th
ed. – McGRAW-
HILL INTERNATIONAL EDITIONS, 1996.
3. Tanenbaum, A.S. Structured Computer Organization, 4th ed. - Upper Saddle River, NJ : Prentice Hall,
2002.
Key words.
Architecture, structure, organization, function, instruction, coding, interface, heritage, processing, storage,
movement, control, peripherals, Central Processing Unit (CPU), Main Memory, System Interconnection
(System Bus), Input, Output, Register, Arithmetic and Login Unit, Control Unit, Sequencing Login, Decoder
Definition 1. Architecture of the Computer System(CS) is a specification of its
interfaces, which determines data processing and includes: methods of data coding,
system of instructions, principles of software-hardware interaction. It is also determined
as a set of information, which is necessary and sufficient for programming in the machinery code.
Definition 2. The operational units and their interconnections that realize the
architecture of the CS is the Organization of the CS.
All Intel x86 family share the same basic architecture The IBM System/370 family share the same basic architecture This gives code compatibility, software succession Organization differs between different versions Architecture is more conservative than organization
Structure is the way of merging (uniting) components of some subsystem in one (whole) unit.
Function is an operation of individual component as a part of the structure.
All computer functions are: 1.- Data processing, 2.- Data storage, 3. - Data movement, 4. – Control
Functional view of a computer
Data Movement Apparatus
Control Mechanism
Data Storage Facility
Data Processing
Facility
Operating
Environment
(sources and
destinations of
data)
Operation (1)
Data movement e.g. keyboard to screen
Data Movement Apparatus
Control Mechanism
Data Storage Facility
Data Processing
Facility
Operation (2)
Storage e.g. Internet download to disk
Data Movement Apparatus
Control Mechanism
Data Storage Facility
Data Processing
Facility
Operation (3)
Processing from/to storage e.g. updating bank statement
Data Movement Apparatus
Control Mechanism
Data Storage Facility
Data Processing Facility
Operation (4)
Processing from storage to I/O e.g. printing a bank statement
Data Movement Apparatus
Control Mechanism
Data Storage Facility
Data Processing
Facility
Structure - Top Level.
Computer
Main Memory
Input Output
Systems Interconnection
Peripherals
Communication
lines
Central Processing Unit
Computer Manages the
functioning of the system,
and executes functions of
data processing;
Stores the initial data
and all information, which is
necessary for data processing
Mechanism, which
provides data interchange
among CPU, MM and I/O
Relocate data between
the computer and
environment in both
directions.
Structure - The CPU
Computer
Arithmetic and
Login Unit
Control
Unit
Internal CPU Interconnection
Registers
CPU
I/O
Memory
System Bus
CPU Store operative information
during the CPU execution of
a current operation;
Executes all operations
concerned with data
processing pithiness;
Mechanism, which provides
joined work of CPU
components;
Controls CPU’s components
functioning
Structure - The Control Unit
CPU
Control Memory
Control Unit Registers and Decoders
Sequencing
Login
Control Unit
ALU
Registers
Internal
Bus
Control Unit
Serves for execution of
concrete actions; it has a
finite set of internal states
and finite set of input
meanings;
Transforms n-width input
binary word into unique
signal on one of 2n
outputs of this schema;
Stores data (a micro-
program as a whole),
which are directly used
by ALU and CU itself.
Sometimes it’s realized in
a form of gate’s set.
Многоуровневая компьютерная организация. Электронные схемы каждого компьютера могут
распознавать и выполнять ограниченный набор простых
(примитивных) команд. Поэтому все программы перед
выполнением должны быть превращены в
последовательность примитивных. Эти примитивные
команды в совокупности составляют язык, на котором
люди общаются с компьютером. Такой язык называется
машинным языком. Использовать машинный язык
утомительно и трудно. Для преодоления этих сложностей
стали строиться ряды уровней абстракции (абстракция
более высокого уровня надстраивается над абстракцией
нижней, здесь под абстракцией понимается набор
удобных для человека команд). Такой подход называют
многоуровневой компьютерной организацией.
Языки, уровни и виртуальные машины. Пусть новые (более удобные для человека) команды в совокупности формируют язык Я1. Машинный язык обозначим Я0 (компьютер может выполнять только эти команды). Для того, чтобы выполнить программу, написанную на Я1, необходимо заменить каждую из команд этой программы на эквивалентный набор
Multilevel computer organization. Electronic circuits of every computer can identify and execute a limited set of simple (primitive) instructions. That is why all the programs must be transformed in a sequence of simple ones. These primitive instructions in totality compose a language in which people communicate with a computer. Such language is called the machine language. It is very difficult and tiresome to use such languages. In order to overcome these difficulties series of abstract levels were constructed (an abstraction of the higher level is build over of the lower one, here under abstraction a set of convenient for user languages is meant). This approach is called multilevel computer organization.
Languages, levels and virtual machines. Let the new (more convenient for user) instructions in
totality form a language L1. The machine language we letter as
L0 (the computer can execute only these instructions). In order
to run the program in L1 language it is necessary to replace each
instruction of this program by an equivalent set of instructions in
команд в языке Я0. В результате мы получим программу, которую может выполнить компьютер. Эта технология называется трансляцией.
Допустим в компьютере имеется специальная
программа (на Я0), которая “берет” программы,
написанные на Я1, в качестве входных данных,
рассматривает каждую команду по очереди и сразу
подбирает эквивалентный набор команд на Я0 и выполняет
их. Такая технология называется интерпретацией.
(Программа, осуществляющая интерпретацию, называется
интерпретатором).
Представим себе существование виртуальной
машины, для которой машинным языком является язык
Я1 и обозначим ее М1, а виртуальную машину с языком Я0
– М0. На самом деле М1 можно сконструировать, но с
большими затратами. Таким образом, можно писать
программы для виртуальных машин и не думать о
трансляции и интерпретации. При этом можно создавать
языки, которые в большей степени ориентированы на
человека: языки Я2, Я3 и т.д., которые является
машинными для виртуальных машин М2, М3 и т.д.
Изобретение новых языков может продолжаться до тех
пор, пока мы не дойдем до подходящего нам языка.
Каждый из этих языков будет использовать предыдущий
как основу, поэтому компьютер можно рассматривать как
the language L0. As a result we’ll get a program, which can be
executed by the computer. This technology is called translation.
Let’s assume that there is a special program (in L0), which
“takes” programs in L1 as data, considers each instruction in
turn and immediately chooses the equivalent set of instructions
in language L0 and executes them. Such technology is called
interpretation. (The program, which executes interpretation is
called an interpreter).
Let’s imagine an existence of a virtual machine, which
has a machine language L1 and letter it as M1, and virtual
machine with a language L0 as M0. In fact M1 may be built, but
with large expenditures. So, it is possible to create programs for
virtual machines and don’t worry about translation and
interpretation. It is possible to create such languages, which are
mostly oriented on users: L2, L3, . . . , Ln, which are machine
languages for virtual machines M2, M3 and so on. Invention of
new languages may continue till the last will satisfy user’s
demands. Each of these languages will use previous as a base,
that is why it is possible to consider computer as a system,
which consists of levels series.
There is an important relation between the language and
the virtual machine. Every machine has got certain machine
language, and the machine indeed determines the language. We
will use terms “level” and “virtual machine” as synonymous . It
is important to remember that only program in L0 can be
систему, состоящую из ряда уровней.
Между языком и виртуальной машиной существует
важная зависимость. У каждой машины есть какой-то
определенный машинный язык, в сущности, машина
определяет язык. Термины «уровень» и «виртуальная
машина» будем использовать как синонимы. При этом
важно помнить, что только программы на Я0 выполняются
компьютером без трансляции. Программисты обычно
интересуются только языком уровня Яn, однако для того,
чтобы понимать, как работает компьютер, необходимо
знать все уровни.
Современные многоуровневые машины.
Большинство современных компьютеров состоят из
двух и более уровней. Уровень 0 – аппаратное обеспечение
(его электронные схемы выполняют программы на языке
первого уровня)). На самом деле существует уровень,
лежащий ниже нулевого, но он попадает в сферу
электронной техники и нами не рассматривается (это –
уровень физических устройств).
Нулевой уровень – цифровой логический уровень
(объектами этого уровня являются вентили, каждый
вентиль формируется из нескольких транзисторов; группа
вентилей формирует 1 бит памяти; биты памяти
объединяются в группы и формируют регистры).
Первый уровень – микро архитектурный уровень.
executed by the computer without translation. Programmers are
usually interested in language Ln only, but for those , who wont
to understand how does the computer work it is necessary to
know all the levels.
Contemporary multi-levels machines.
The majority of contemporary computers include
two and more levels (up to six levels). The zero level is
hardware (its electron circuits execute programs in
language of the first level). In fact there is one more
level below zero one, but it belongs to the sphere of
electronic engineering and we will not consider it here
(it is a level of physical devices).
The zero level is the digital-logic level (its objects
are gates; every gate is constructed from some
transistors; a group of gates forms 1 bit of memory;
groups of gates are united in another groups and form
registers).
The first level is called micro architecture level.
There are 8 or 32 registers and ALU (arithmetic-logic
unit) on this level, the registers form local memory.
ALU performs simple logic and arithmetic operations.
Memory registers and ALU altogether form data tract
На этом уровне имеется совокупность 8 или 32 регистров,
которые формируют локальную память и схему АЛУ
(арифметико-логического устройства). АЛУ выполняет
простые логические операции. Регистры вместе с АЛУ
формируют тракт данных (тракт данных состоит из:
выбора одного или двух регистров, АЛУ производит
действия с данными этих регистров и помещает результат
в один из них). Тракт данных может контролироваться
специальной микропрограммой, либо аппаратными
средствами. Для машин, где тракт данных контролируется
программным обеспечением, микропрограмма – это
интерпретатор для команд на уровне 2.
Второй уровень – уровень архитектуры системы
команд. Он включает команды, которые выполняются
микропрограммой-интерпретатором или аппаратным
обеспечением.
Третий уровень – уровень операционной системы.
Этот уровень имеет гибридный характер. Он может
включать команды, имеющиеся на нижних уровнях.
Особенностью этого уровня является: наличие новых
команд, другая организация памяти, способность
выполнять несколько программ одновременно и др. Новые
средства, появившиеся на третьем уровне, выполняются
интерпретатором, который работает на втором уровне.
Этот интерпретатор был когда-то назван операционной
(data tract consists in selection of one or two registers,
ALU executes some operations with data of these
registers and places result in some of the registers). The
tract may be controlled by a special micro instruction
or by the hardware facilities. For machines in which the
data tract is controlled by the micro program the last is
called an interpreter for instructions for the second
level’s programs.
The second level is the level of system
instructions architecture. It includes instructions
which are executed by the micro program interpreter or
by hardware.
The third level is called the operation system level.
It may include instructions which belong to the lowest
levels. The peculiarity of this level: the presence of new
instructions; the use of another memory organization;
the ability to execute many programs simultaneously
and others. So, a part of instructions of the third level
(new instructions) is interpreted by the operational
system, and the other one (instructions identical to
instructions of the second level) is interpreted by micro
program (that is why this level has a hybrid character).
системой. Таким образом, одна часть команд третьего
уровня (новые команды) интерпретируется операционной
системой, а другая (команды идентичные командам
второго уровня)-–микропрограммой (вот почему он
является гибридным).
Четвертый уровень – уровень языка ассемблера.
Этот уровень представляет символическую форму одного
из языков более низкого уровня. На этом уровне можно
писать программы в приемлемой для человека форме. Эти
программы сначала транслируются на язык уровня 1, 2 или
3, а затем интерпретируются соответствующей
виртуальной или фактически существующей машиной.
Программа, которая выполняет трансляцию, называется
ассемблером.
Пятый уровень – Языки высокого уровня. Этот
уровень состоит из языков, разработанных для прикладных
программистов. Программы, написанные на этих языках,
обычно транслируются на уровень 4 или 3. Трансляторы,
которые обрабатывают эти программы называются
компиляторами (иногда также используется метод
интерпретации, например программы на язык Java обычно
интерпретируются).
Вывод : компьютер проектируется как иерархическая
структура уровней, каждый из которых надстраивается над
предыдущим. Каждый уровень представляет собой
The forth level is the level of assembly language.
This level represents a symbolic form (not digital) of
one of the languages of lower levels. On this level it is
possible to write programs in an acceptable to users
form. These programs are translated first on some of
languages of levels 1, 2 or 3, and after this they are
interpreted by corresponding virtual or really existed
machine (most of the programs of 4th
level are supported
by a translator; programs of 2nd
and 3rd
levels are
interpretable). The program which fulfills the translation
is called assembler.
The fifth level is the High Level Languages. This
level consists of languages which were created for the
applied programmers. Programs in such languages are
usually translated into the 4th
or 3rd
level. Translators,
which processed these programs are called compilers
(sometimes the method of interpretation is used; e.g.
programs in Java language are usually interpreted).
Inference: computer is usually designed as a
hierarchy structure of levels each of which is built over
the preceding. Every level represents a certain
abstraction with different objects and operations.
определенную абстракцию с различными объектами и
операциями.
Набор типов данных, операций и особенностей
каждого уровня называется архитектурой.
В конце 50-х годов компания IBM решила, что
производство семейств компьютеров, каждый из которых
выполняет одни и те же команды, имеет много
преимуществ и для компании и для покупателей. Чтобы
описать уровень совместимости таких компьютеров IBM
ввела термин архитектура. Новое семейство компьютеров
должно было иметь одну общую архитектуру и много
разных разработок, различающихся по цене и скорости
(при этом они могли выполнять одни и те же программы).
Это достигалось с помощью интерпретации (такую
технологию предложил Уилкс в 1951 году). Аппаратное
обеспечение без интерпретации использовалось в самых
дорогих компьютерах.
Развитие многоуровневых машин.
Аппаратное обеспечение состоит из осязаемых
объектов: интегральных схем, печатных плат, кабелей,
источников электропитания, запоминающих устройств и
устройств ввода/вывода.
Программное обеспечение состоит из
подробных последовательностей команд и их
The set of data types, operations and specifications
of every level is called architecture.
At the end of 50-th the IBM company decided, that manufacturing of computer families, where every computer executes the same instructions, has many preferences as for the company, so for customers. In order to describe the level of compatibility of such computers the IBM introduced the term architecture. The new family of computers should have the same common architecture and many different elaboration, which are distinguished by prices and velocities (with it all they can perform the same set of programs). It was realized with help of interpretation. The hardware instead of interpretation was used in the most expensive computers.
Development of multilevel machines. Hardware consists of tangible objects: integration
circuit boards, cables, power supply, storage units,
input/output devices.
Software consists of detail sequences of instructions
and their computer presentations (i.e. programs).
компьютерных представлений.
Вначале граница между АО и ПО была очевидной. Со
временем эта граница стала размытой.
В действительности АО и ПО логически эквивалентны:
любая операция, выполняемая ПО, может быть встроена в
АО (желательно после того, как она создана) и, - наоборот.
Решение разделить функции АО и ПО основано на таких
фактах, как стоимость, скорость, надежность, а также
частота ожидаемых изменений, однако существует
несколько жестких правил, которые определяют, что
должно принадлежать АО, а что – ПО.
At first there was a legible border between hardware
and software. Eventually this border has become
obliterated.
In reality hardware and software are logically
equivalent: any operation which is executed by software
can be mounted in hardware (it is advisable after it has
been carried out), and vice versa.
The decision of hardware-software functions
disintegration is based on such facts as: cost, velocity
and frequency of anticipated changes, but there are some
strict regulations which determine what must belong to
hardware and what to software.
Уровень операционной системы
Уровень языка ассемблера
Язык высокого уровня
Уровень архитектуры команд
Микроархитектурный уровень
Уровень 5
Уровень 4
Уровень 3
Уровень 2
Цифровой логический уровень
Уровень 1
Уровень 0
Аппаратное обеспечение
Интерпретация (микропрограмма)
или непосредственное выполнение
Трансляция (ассемблер)
Трансляция (ассемблер)
Трансляция (компилятор)
Компьютер с шестью уровнями
Data are base elements of information, such as numbers,
letters, symbols and so on, which are processed or carried out by
human or computer (or by some machine) [sometimes the
information itself, prepared for certain purposes(in a special
form) is considered as data].
Information is a matter conferred (присваиваемое
содержимое) to the data.
Format is a way of data representation, or a scheme of data
positioning.
System is a set of material or abstract objects, which are
simultaneously considered as one whole (entire), and which
have been united for achieving some certain results.
Computer System is a device or a complex of devices, which
is intended for mechanization or automating of data processing,
and which is constructed on the base of electronic elements
(transistors, logic circuits, magnet elements and so on).
Analog Computer is a computing device, which processes
data given in a form of continuously changing physical values,
the meanings of which may be measured (such values may be
angle or linear transfers, electric voltage, electric current power,
time and so on). These analog values are processed by
mechanical or some other physical methods, by measuring
results of such operations. Such type of computers are usually
used for solving equations, describing processes in real scale of
time, when initial data is input from special measuring data
monitors.]
[ Digital Computer is an electronic computing device, which
receives a discrete input data, processes it in accordance with
the list of instructions stored inside it and generates resulting
output data. (Instructions may be considered as a special type of
data, which are coded in correspondence with format; these
instructions: a)manage data transfer as inside the computer
itself, so the computer internal and peripheral devices (input-
output devices), b) determine arithmetic or logic operations to
be performed).]
[ Hybrid Computer is a computing system, in which elements
of analog and digital computers are combined. These computers
are used for solving equations by implementing analog devices,
but for storage, future processing and results representation
digital devices are implemented.]
Configuration of Computer System is a concrete composition
of hardware devices and interconnections among them, which is
used during a certain period of time. It determines character of
the considered system work (there is a special program in
Computer System, which allows change in available
frameworks Composition of Computer).
Hardware consists of tangible(palpable) objects: integrated
circuits, printed boards, cables, memory devices, printers, some
others technical devices and physical equipment.
Software is a detailed instructions that control the operation
of a computer system.
Interface is:
(1) a relation between two processing components.
(2) a complete complex of agreements (a language in a
common sense) concerning input and output signals,
by which may exchange the following data processors:
computer device – computer device; program –
program medium; human beings – data processing
system, - and some others. These agreements are called
protocols. Protocols are sequence of technical requirements,
which must be provided by constructors of any device for
successful concordance (compatibility) of its (the considered
device) work with other devices.
Questions for Quiz № 1 and № 2
1. In your own words explain the following notions(concepts)
and give examples:
a) data, information, format;
b) computer (analog, digital, hybrid);
c) hardware, software, computer configuration;
d) function, structure, interface;
e) architecture, organization.
2. List the major components of contemporary computer
system and indicate there functions.
3. List operations, which you more often use, when you work
with a computer and explain, which of the computer’s major
components are engaged in a process of executing one of
these operations.
4. Analyze the 5 given below definitions of Computer
architecture. Which of these definitions does more than
others correspond to the officially accepted one? (Give a
detailed explanation).
1)”The design of the integrated system which provides a
useful tool to the programmer” (Bear)
2)”The study of structure, behavior and design of
computers” (Hayes)
3)”The design of the system specification at a general or
subsystem level”(Abd-Alla)
4)”The art of designing a machine that will be pleasure to
work with”(Foster)
5)”The interface between the hardware and the lowest level
software”(Hennessy and Patterson).
5. Call the minimal number of levels of virtual machine, which
can execute all main computer functions (give explanation).
6. What is the difference between translator and interpreter?
7. Why computer hardware and computer software are
considered as logically equivalent?
List all operations (draw up a sketch), which are to be
performed by Computer System for:
1. – record deletion;
2. – data editing;
3. – data correcting on the hard disk;
4. – copying data from the hard disk to CD;
5. – printing a text from CD;
6. – searching a file on the hard disk;
7. – searching a file in Internet;
8. – copying a file from Internet to the hard disk;
9. – data correcting on a diskette;
10. – rename a file on the hard disk;
11. – archiving a file on the hard disk;
12. – archiving a file on a diskette;
13. – disarchiving a file on the hard disk;
14. – installation of a program from CD;
15. – deletion a record in a file on a diskette;
16. – printing a text from a site in Internet;
17. – archiving a file in the hard disk and copying it to a
diskette.
Lecture №2
Computer Evolution and Performance
1. The electronic Era of Computers, Generation I .
2. Structure of von Nuemann machine.
3. Structure of IAS.
4. Generations II, III and IV. Moor’s Law.
Literature.
1. Stallings W. Computer Organization and Architecture. Designing and performance, 5th ed. – Upper
Saddle River, NJ : Prentice Hall, 2002.
2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th
ed. – McGRAW-
HILL INTERNATIONAL EDITIONS, 1996.
3. Tanenbaum, A.S. Structured Computer Organization, 4th ed. - Upper Saddle River, NJ : Prentice Hall,
2002.
ENIAC – background
Electronic Numerical Integrator And Computer Eckert and Mauchly University of Pennsylvania Trajectory tables for weapons Started 1943 Finished 1946 Too late for war effort Used until 1955
ENIAC - details
Decimal (not binary) 20 accumulators of 10 digits Programmed manually by switches 18,000 vacuum tubes 30 tons 15,000 square feet 140 kW power consumption 5,000 additions per second
von Neumann/Turing
EDVAC (Electronic Discrete Variable Automatic Calculator) Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing Input and output equipment operated by control unit Princeton Institute for Advanced Studies IAS (Immediate Address Storage – Память с Прямой Адресацией) Completed 1952
Основные моменты концепции фон Неймана.
Данные и команды хранятся совместно в единой подсистеме
памяти, способной выполнять операции чтения и записи;
К отдельным элементам информации, хранящейся в памяти,
можно обращаться по адресу, характеризующему ее
положение в общем массиве, независимо от смысла
затребованной информации;
Заданный алгоритм реализуется последовательным
выполнением элементарных команд в порядке их
расположения в памяти, если иное не будет указано явно.
Key Concepts of von Neuman Architecture.
Data and instructions are stored in a single read-write memory
subsystem;
The contents of this memory are addressable by location, without
regard to the type of data contained there;
Execution occurs in a sequential fashion (unless explicitly
modified) from one instruction to the next.
Structure of von Nuemann machine
Main Memory
(M)
Arithmetic and Logic Unit
Program Control Unit (PCU)
Input Output Equipment
Works in accordance of
signals coming from the
PCU
Analyses program’s instructions taken out from
the M and organizes its execution
Accumulator
Processes data, which is presented in
a binary form
Contains data
( instructions)
19
39 1 0
IAS - details
1000 x 40 bit words(1000 cells, and each cell contains 40 bit)
Binary number
2 x 20 bit instructions(they were stored at the same cells)
Set of registers (storage in CPU)
Memory Buffer Register(MBR stores a word, which should be put into the memory or which just have been taken out of the memory).
Memory Address Register(MAR stores an address of the memory cell to which we call for write or read data).
Instruction Register(IR stores an operation code of current instruction 8 bit width during process of its execution).
Instruction Buffer Register(IBR serves for the temporary storage of the right instruction, which has been fetched )
Program Counter(PC stores address of the next word of instruction, which should be fetched next)
Accumulator(AC serves for temporary storage operands and results in ALU)
Multiplier Quotient(MQ serves for temporary storage operands and results in ALU)
Знаковый
разряд
Слово числа
Левая команда Правая команда
0 8 19 20 28 39
Код
операции
Адрес Код Адрес
операции
Слово команды
Structure of IAS - detail
Main Memory
Arithmetic and Logic Unit
Program Control Unit
Input Output Equipment
MBR
Arithmetic & Logic circuits Circuits
MQ Accumulator
MAR
Control Circuits
IBR
IR
PC
Instructions & Data
Central Processing Unit
Signals
Commercial Computers
1947 - Eckert-Mauchly Computer Corporation UNIVAC I (Universal Automatic Computer) US Bureau of Census 1950 calculations Became part of Sperry-Rand Corporation Late 1950s - UNIVAC II Faster More memory Upward compatibility
IBM
Punched-card processing equipment 1953 - the 701 IBM’s first stored program computer Scientific calculations 1955 - the 702 Business applications Lead to 700/7000 series
Transistors
Replaced vacuum tubes Smaller Cheaper Less heat dissipation Solid State device Made from Silicon (Sand) Invented 1947 at Bell Labs William Shockley et al.
Transistor Based Computers
Second generation machines NCR & RCA produced small transistor machines IBM 7000 DEC(Digital Equipment Corporation)- 1957 Produced PDP-1
Microelectronics
Literally - “small electronics” A computer is made up of gates, memory cells and interconnections These can be manufactured on a semiconductor e.g. silicon wafer
Generations of Computer
Vacuum tube - 1946-1957 Transistor - 1958-1964 Small scale integration - 1965 on Up to 100 devices on a chip Medium scale integration - to 1971 100-3,000 devices on a chip Large scale integration - 1971-1977 3,000 - 100,000 devices on a chip Very large scale integration - 1978 to date 100,000 - 100,000,000 devices on a chip Ultra large scale integration Over 100,000,000 devices on a chip
Moore’s Law
Increased density of components on chip Gordon Moore - cofounder of Intel
Number of transistors on a chip will double every year Since 1970’s development has slowed a little Number of transistors doubles every 18 months Cost of a chip has remained almost unchanged Higher packing density means shorter electrical paths, giving higher performance Smaller size gives increased flexibility Reduced power and cooling requirements Fewer interconnections increases reliability
Growth in CPU Transistor Count
IBM 360 series
1964 Replaced (& not compatible with) 7000 series First planned “family” of computers Similar or identical instruction sets Similar or identical O/S Increasing speed Increasing number of I/O ports (i.e. more terminals) Increased memory size Increased cost
Multiplexed switch structure
DEC PDP-8
1964 First minicomputer (after miniskirt!) Did not need air conditioned room Small enough to sit on a lab bench $16,000 $100k+ for IBM 360
Embedded applications BUS STRUCTURE
DEC - PDP-8 Bus Structure
OMNIBUS
Console Controller
CPU
Main Memory I/O Module
I/O Module
Semiconductor Memory
1970 Fairchild Size of a single core i.e. 1 bit of magnetic core storage
Holds 256 bits Non-destructive read Much faster than core Capacity approximately doubles each year
Intel
1971 - 4004 First microprocessor
All CPU components on a single chip 4 bit
Followed in 1972 by 8008 8 bit Both designed for specific applications
1974 - 8080 Intel’s first general purpose microprocessor
Speeding it up
Pipelining On board cache On board L1 & L2 cache
Branch prediction Data flow analysis
Speculative execution
Performance Mismatch
Processor speed increased Memory capacity increased Memory speed lags behind processor speed
DRAM and Processor Characteristics
Trends in DRAM use
Definition. The Computer Performance (CP) is determined
by number of certain (well known) operations per time
unity.
The generalized estimation of the CP is a number of
transactions per second.
The basic performance characteristics of a computer system:
processor speed, memory capacity, interconnection data rates.
Solutions
Increase number of bits retrieved at one time Make DRAM “wider” rather than “deeper” Change DRAM interface Cache Reduce frequency of memory access More complex cache and cache on chip Increase interconnection bandwidth High speed buses Hierarchy of buses
Def.1 Register is an area of internal (high-speed) memory for
temporary storing data.
Def.2 Word (computer word) is an assemblage of quite certain
number of symbols (binary digits, bits), which is perceived by
the computer as entire(whole, integer, not dividable) one and
has got a strict meaning (sense).
Two main units (ALU and Program Control Unit) take part in
execution IAS’ instructions.
The major components of ALU are:
1. Registers:
AC – accumulator, which serves for temporary storage senior 40
(from 80 possible) bits, by which input operand or obtained result have
been coded;
MQ – multiplier quotient serves for temporary storage junior 40
bits, by which input operand or obtained result have been coded;
MBR – memory buffer register stores a word, which is to be written
into the memory, or which just has been fatted from the memory.
2. Arithmetic & Logic Circuits Unit performs the primary arithmetic
and logical operations of the computer.
Program Control Unit includes the following components:
1. Registers:
IBR – instruction buffer register, which is intended for storing the
right instruction, which has been fatted from the memory;
IR – instruction register serves for storing the left instruction, which
just has been fatted;
MAR – memory address register is intended for storing an address
of word, which is to be written into the memory, or to be read from it;
PC – program counter stores an address of a word (where the left
and right instructions are stored), which are to be executed next.
2. Control Circuits coordinate and control the other parts of computer
system. They read the stored program (one instruction at a time),
direct other components of the computer system to perform the tasks
required by the program. The series of operations required to process a
single machine instruction is called the machine cycle.
Instructions of IAS.
There were 21 instructions in IAS. All these instructions may be
divided into 5 groups:
1. Data transfer instructions, these instructions remove data from the
memory cells in registers AC or MQ, or vice verse ;
2. Instructions of unconditional jumps;
3. Instructions of conditional jumps;
4. Arithmetic instructions;
5. Commands of modification a part of some instructions.
MULTIPLEXOR
CPU
Main Memory
1
2
3
7
8
9
4
5
6
DATA
CHANNEL
DATA
CHANNEL
DATA
CHANNEL
1.(8.) Magnetic tape storage; 2. Puncher; 3. Printer; 4. Punch cards reader; 5. Magnetic drum; 6.(7.) Magnetic disk
storage.;9Data communication equipment.
Configuration of Generation II Typical Computer (IBM 7094)
Data Channel is an independent I/O block, which is equipped with its own processor and own system of instructions. These
instructions are stored in the main memory subsystem, but they are executed only by corresponding processor (of I/O block). CPU initializes the session (process of starting, using and completion interactions between applications and computer devices for data transfer)
through the channel, by sending a concrete signal to the I/O module, and after it all necessary operation are performed by this module in
correspondence with a program, which is fatted from the main memory. After completing the session the I/O module informs CPU (by sending special
signal). So, CPU is released from executing tasks, which are not peculiar to it.
Multiplexor is a device, which serves as a central commutator for data transfer among data channels, CPU and the main
memory. It may be considered as a dispatcher (manager) of access to the main memory by CPU and data channels (it provides
independence for work as for the channels, so CPU).
Transistor is an electronic device on the base of semiconductor crystal, which has got three or more electrodes; it intended for
amplification, generation or transformation electric oscillations.
Integrated circuit is an electronic device made by printing thousands or even millions of tiny transistors and some other
electronic elements on a small silicon crystal (chip), which are connected in a certain way and considered as entire one.
Base Electronic Elements of Computer
Logical
Function …..
Input
Timing signal
Electronic Circuit
with 2 stable states
Input
READ
Output Output
WRITE
a) Gate b) Memory Cell
Questions to Lecture 2.
1. Describe Architecture and Structure Organization of
computers of I, II, III and IV generations, compare them.
2. Formulate and analyze Key Concepts of von Neumann
Architecture.
3. Describe the functional structure of von Neumann machine.
4. Describe the functional structure of IAS. List elements of
Architecture and Structure Organization (details) of IAS.
5. List and describe base electronic components of
contemporary computer.
6. Formulate and analyze Moor’s Law.
7. What’s Computer System Performance? List the basic
characteristics of Computer System Performance.
Arithmetic logic unit
From Wikipedia, the free encyclopedia
(Redirected from Arithmetic Logic Unit)
Jump to: navigation, search
Arithmetic Logic Unit schematic symbol
Cascadable 8 Bit ALU Texas Instruments SN74AS888
In computing, an arithmetic logic unit (ALU) is a digital circuit that performs arithmetic and
logical operations. The ALU is a fundamental building block of the central processing unit
(CPU) of a computer, and even the simplest microprocessors contain one for purposes such as
maintaining timers. The processors found inside modern CPUs and graphics processing units
(GPUs) accommodate very powerful and very complex ALUs; a single component may
contain a number of ALUs.
Mathematician John von Neumann proposed the ALU concept in 1945, when he wrote a
report on the foundations for a new computer called the EDVAC. Research into ALUs
remains an important part of computer science, falling under Arithmetic and logic
structures in the ACM Computing Classification System.
Contents
1 Numerical systems
2 Practical overview
o 2.1 Simple operations
o 2.2 Complex operations
o 2.3 Inputs and outputs
o 2.4 ALUs vs. FPUs
3 See also
4 References
5 External links
[edit] Numerical systems
Main article: Signed number representations
An ALU must process numbers using the same format as the rest of the digital circuit. The
format of modern processors is almost always the two's complement binary number
representation. Early computers used a wide variety of number systems, including ones'
complement, two's complement sign-magnitude format, and even true decimal systems, with
ten tubes per digit.[disputed – discuss]
ALUs for each one of these numeric systems had different designs, and that influenced the
current preference for two's complement, as this is the representation that makes it easier for
the ALUs to calculate additions and subtractions.[citation needed]
The ones' complement and two's complement number systems allow for subtraction to be
accomplished by adding the negative of a number in a very simple way which negates the
need for specialized circuits to do subtraction; however, calculating the negative in two's
complement requires adding a one to the low order bit and propagating the carry. An
alternative way to do two's complement subtraction of A−B is to present a one to the carry
input of the adder and use ¬B rather than B as the second input.
[edit] Practical overview
Most of a processor's operations are performed by one or more ALUs. An ALU loads data
from input registers, an external Control Unit then tells the ALU what operation to perform on
that data, and then the ALU stores its result into an output register. The Control Unit is
responsible for moving the processed data between these registers, ALU and memory.
[edit] Simple operations
A simple example arithmetic logic unit (2-bit ALU) that does AND, OR, XOR, and addition
Most ALUs can perform the following operations:
Integer arithmetic operations (addition, subtraction, and sometimes multiplication and
division, though this is more expensive)
Bitwise logic operations (AND, NOT, OR, XOR)
Bit-shifting operations (shifting or rotating a word by a specified number of bits to the
left or right, with or without sign extension). Shifts can be interpreted as
multiplications by 2 and divisions by 2.
[edit] Complex operations
This section's tone or style may not be appropriate for Wikipedia. Specific
concerns may be found on the talk page. See Wikipedia's guide to writing better
articles for suggestions. (January 2011)
Engineers can design an Arithmetic Logic Unit to calculate any operation. The more complex
the operation, the more expensive the ALU is, the more space it uses in the processor, the
more power it dissipates. Therefore, engineers compromise. They make the ALU powerful
enough to make the processor fast, but yet not so complex as to become prohibitive. For
example, computing the square root of a number might use:
1. Calculation in a single clock Design an extraordinarily complex ALU that calculates
the square root of any number in a single step.
2. Calculation pipeline Design a very complex ALU that calculates the square root of
any number in several steps. The intermediate results go through a series of circuits
arranged like a factory production line. The ALU can accept new numbers to calculate
even before having finished the previous ones. The ALU can now produce numbers as
fast as a single-clock ALU, although the results start to flow out of the ALU only after
an initial delay.
3. interactive calculation Design a complex ALU that calculates the square root through
several steps. This usually relies on control from a complex control unit with built-in
microcode.
4. Co-processor Design a simple ALU in the processor, and sell a separate specialized
and costly processor that the customer can install just beside this one, and implements
one of the options above.
5. Software libraries Tell the programmers that there is no co-processor and there is no
emulation, so they will have to write their own algorithms to calculate square roots by
software.
6. Software emulation Emulate the existence of the co-processor, that is, whenever a
program attempts to perform the square root calculation, make the processor check if
there is a co-processor present and use it if there is one; if there isn't one, interrupt the
processing of the program and invoke the operating system to perform the square root
calculation through some software algorithm.
The options above go from the fastest and most expensive one to the slowest and least
expensive one. Therefore, while even the simplest computer can calculate the most
complicated formula, the simplest computers will usually take a long time doing that because
of the several steps for calculating the formula.
Powerful processors like the Intel Core and AMD64 implement option #1 for several simple
operations, #2 for the most common complex operations and #3 for the extremely complex
operations.
[edit] Inputs and outputs
The inputs to the ALU are the data to be operated on (called operands) and a code from the
control unit indicating which operation to perform. Its output is the result of the computation.
In many designs the ALU also takes or generates as inputs or outputs a set of condition codes
from or to a status register. These codes are used to indicate cases such as carry-in or carry-
out, overflow, divide-by-zero, etc.
[edit] ALUs vs. FPUs
A Floating Point Unit also performs arithmetic operations between two values, but they do so
for numbers in floating point representation, which is much more complicated than the two's
complement representation used in a typical ALU. In order to do these calculations, a FPU
has several complex circuits built-in, including some internal ALUs.
In modern practice, engineers typically refer to the ALU as the circuit that performs integer
arithmetic operations (like two's complement and BCD). Circuits that calculate more complex
formats like floating point, complex numbers, etc. usually receive a more specific name such
as FPU.
[edit] See also
7400 series
74181
adder (electronics)
multiplication ALU
digital circuit
division (electronics)
Control Unit
[edit] References
Hwang, Enoch (2006). Digital Logic and Microprocessor Design with VHDL.
Thomson. ISBN 0-534-46593-5. http://faculty.lasierra.edu/~ehwang/digitaldesign.
Stallings, William (2006). Computer Organization & Architecture: Designing for
Performance 7th ed. Pearson Prentice Hall. ISBN 0-13-185644-8.
http://williamstallings.com/COA/COA7e.html.
[edit] External links
A Simulator of Complex ALU in MATLAB
An ALU implemented in Minecraft
v · d · eCPU technologies
Architecture
ISA : CISC · EDGE · EPIC · MISC · OISC · RISC · VLIW ·
NISC · ZISC · Harvard architecture · von Neumann architecture ·
4-bit · 8-bit · 12-bit · 16-bit · 18-bit · 24-bit · 31-bit · 32-bit · 36-
bit · 48-bit · 64-bit · 128-bit · Comparison of CPU architectures
Parallelism
Pipeline
Instruction pipelining · In-order & out-of-
order execution · Register renaming ·
Speculative execution · Hazards
Level Bit · Instruction · Superscalar · Data · Task
Threads
Multithreading · Simultaneous
multithreading · Hyperthreading ·
Superthreading
Flynn's taxonomy SISD · SIMD · MISD · MIMD
Types
Digital signal processor · Microcontroller · System-on-a-chip ·
Vector processor
Components
Arithmetic logic unit (ALU) · Address generation unit (AGU) ·
Barrel shifter · Floating-point unit (FPU) · Back-side bus ·
Multiplexer · Demultiplexer · Registers · Memory management
unit (MMU) · Translation lookaside buffer (TLB) · Cache ·
Register file · Microcode · Control unit · Clock rate
Power management
APM · ACPI · Dynamic frequency scaling · Dynamic voltage
scaling · Clock gating
Retrieved from "http://en.wikipedia.org/wiki/Arithmetic_logic_unit"
Categories: Digital circuits | Central processing unit | Computer arithmetic
Hidden categories: All accuracy disputes | Articles with disputed statements from November
2010 | All articles with unsourced statements | Articles with unsourced statements from
October 2007 | Wikipedia articles needing style editing from January 2011 | All articles
needing style editing
Personal tools
Log in / create account
Namespaces
Article
Discussion
Variants
Views
Read
Edit
View history
Actions
Search
Navigation
Main page
Contents
Featured content
Current events
Random article
Donate to Wikipedia
Interaction
Help
About Wikipedia
Community portal
Recent changes
Contact Wikipedia
Toolbox
What links here
Related changes
Upload file
Special pages
Permanent link
Cite this page
Print/export
Create a book
Download as PDF
Printable version
Languages
ية عرب ال
Български
Català
Česky
Deutsch
Eesti
Ελληνικά
Español
Euskara
سی ار ف
Français
Galego
한국어
Bahasa Indonesia
Italiano
עברית
Latina
Latviešu
Lëtzebuergesch
Magyar
Nederlands
日本語
orsk bokm l
Polski
Português
Română
Русский
Shqip
Simple English
Slovenčina
Svenska
ไทย Türkçe
Tiếng Việt
中文
This page was last modified on 21 February 2011 at 17:05.
Text is available under the Creative Commons Attribution-ShareAlike License;
additional terms may apply. See Terms of Use for details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit
organization.
Contact us
Privacy policy
About Wikipedia
Disclaimers
Computer
From Wikipedia, the free encyclopedia
Jump to: navigation, search
For other uses, see Computer (disambiguation).
"Computer technology" redirects here. For the company, see Computer Technology Limited.
Computer
A computer is a programmable machine that receives input, stores and automatically
manipulates data, and provides output in a useful format.
The first electronic computers were developed in the mid-20th century (1940–1945).
Originally, they were the size of a large room, consuming as much power as several hundred
modern personal computers (PCs).[1]
Modern computers based on integrated circuits are millions to billions of times more capable
than the early machines, and occupy a fraction of the space.[2]
Simple computers are small
enough to fit into mobile devices, and can be powered by a small battery. Personal computers
in their various forms are icons of the Information Age and are what most people think of as
"computers". However, the embedded computers found in many devices from MP3 players to
fighter aircraft and from toys to industrial robots are the most numerous.
Contents
1 History of computing
o 1.1 Limited-function ancient computers
o 1.2 First general-purpose computers
o 1.3 Stored-program architecture
o 1.4 Semiconductors and microprocessors
2 Programs
o 2.1 Stored program architecture
o 2.2 Bugs
o 2.3 Machine code
o 2.4 Higher-level languages and program design
3 Function
o 3.1 Control unit
o 3.2 Arithmetic/logic unit (ALU)
o 3.3 Memory
o 3.4 Input/output (I/O)
o 3.5 Multitasking
o 3.6 Multiprocessing
o 3.7 Networking and the Internet
4 Misconceptions
o 4.1 Required technology
o 4.2 Computer architecture paradigms
o 4.3 Limited-function computers
o 4.4 Virtual computers
5 Further topics
o 5.1 Artificial intelligence
o 5.2 Hardware
o 5.3 Software
o 5.4 Programming languages
o 5.5 Professions and organizations
6 See also
7 Notes
8 References
9 External links
History of computing
Main article: History of computing hardware
The first use of the word "computer" was recorded in 1613, referring to a person who carried
out calculations, or computations, and the word continued with the same meaning until the
middle of the 20th century. From the end of the 19th century onwards, the word began to take
on its more familiar meaning, describing a machine that carries out computations.[3]
Limited-function ancient computers
The Jacquard loom, on display at the Museum of Science and Industry in Manchester,
England, was one of the first programmable devices.
The history of the modern computer begins with two separate technologies—automated
calculation and programmability—but no single device can be identified as the earliest
computer, partly because of the inconsistent application of that term. Examples of early
mechanical calculating devices include the abacus, the slide rule and arguably the astrolabe
and the Antikythera mechanism, an ancient astronomical computer built by the Greeks around
80 BC.[4]
The Greek mathematician Hero of Alexandria (c. 10–70 AD) built a mechanical
theater which performed a play lasting 10 minutes and was operated by a complex system of
ropes and drums that might be considered to be a means of deciding which parts of the
mechanism performed which actions and when.[5]
This is the essence of programmability.
The "castle clock", an astronomical clock invented by Al-Jazari in 1206, is considered to be
the earliest programmable analog computer.[6][verification needed]
It displayed the zodiac, the solar
and lunar orbits, a crescent moon-shaped pointer travelling across a gateway causing
automatic doors to open every hour,[7][8]
and five robotic musicians who played music when
struck by levers operated by a camshaft attached to a water wheel. The length of day and
night could be re-programmed to compensate for the changing lengths of day and night
throughout the year.[6]
The Renaissance saw a re-invigoration of European mathematics and engineering. Wilhelm
Schickard's 1623 device was the first of a number of mechanical calculators constructed by
European engineers, but none fit the modern definition of a computer, because they could not
be programmed.
First general-purpose computers
In 1801, Joseph Marie Jacquard made an improvement to the textile loom by introducing a
series of punched paper cards as a template which allowed his loom to weave intricate
patterns automatically. The resulting Jacquard loom was an important step in the development
of computers because the use of punched cards to define woven patterns can be viewed as an
early, albeit limited, form of programmability.
The Most Famous Image in the Early History of Computing
[9]
This portrait of Jacquard was woven in silk on a Jacquard loom and required 24,000 punched
cards to create (1839). It was only produced to order. Charles Babbage owned one of these
portraits ; it inspired him in using perforated cards in his analytical engine[10]
It was the fusion of automatic calculation with programmability that produced the first
recognizable computers. In 1837, Charles Babbage was the first to conceptualize and design a
fully programmable mechanical computer, his analytical engine.[11]
Limited finances and
Babbage's inability to resist tinkering with the design meant that the device was never
completed ; nevertheless his son, Henry Babbage, completed a simplified version of the
analytical engine's computing unit (the mill) in 1888. He gave a successful demonstration of
its use in computing tables in 1906. This machine was given to the Science museum in South
Kensington in 1910.
In the late 1880s, Herman Hollerith invented the recording of data on a machine readable
medium. Prior uses of machine readable media, above, had been for control, not data. "After
some initial trials with paper tape, he settled on punched cards ..."[12]
To process these
punched cards he invented the tabulator, and the keypunch machines. These three inventions
were the foundation of the modern information processing industry. Large-scale automated
data processing of punched cards was performed for the 1890 United States Census by
Hollerith's company, which later became the core of IBM. By the end of the 19th century a
number of technologies that would later prove useful in the realization of practical computers
had begun to appear: the punched card, Boolean algebra, the vacuum tube (thermionic valve)
and the teleprinter.
During the first half of the 20th century, many scientific computing needs were met by
increasingly sophisticated analog computers, which used a direct mechanical or electrical
model of the problem as a basis for computation. However, these were not programmable and
generally lacked the versatility and accuracy of modern digital computers.
Alan Turing is widely regarded to be the father of modern computer science. In 1936 Turing
provided an influential formalisation of the concept of the algorithm and computation with the
Turing machine, providing a blueprint for the electronic digital computer.[13]
Of his role in the
creation of the modern computer, Time magazine in naming Turing one of the 100 most
influential people of the 20th century, states: "The fact remains that everyone who taps at a
keyboard, opening a spreadsheet or a word-processing program, is working on an incarnation
of a Turing machine".[13]
The Zuse Z3, 1941, considered the world's first working programmable, fully automatic
computing machine.
The ENIAC, which became operational in 1946, is considered to be the first general-purpose
electronic computer.
EDSAC was one of the first computers to implement the stored program (von Neumann)
architecture.
Die of an Intel 80486DX2 microprocessor (actual size: 12×6.75 mm) in its packaging.
The Atanasoff–Berry Computer (ABC) was among the first electronic digital binary
computing devices. Conceived in 1937 by Iowa State College physics professor John
Atanasoff, and built with the assistance of graduate student Clifford Berry,[14]
the machine
was not programmable, being designed only to solve systems of linear equations. The
computer did employ parallel computation. A 1973 court ruling in a patent dispute found that
the patent for the 1946 ENIAC computer derived from the Atanasoff–Berry Computer.
The inventor of the program-controlled computer was Konrad Zuse, who built the first
working computer in 1941 and later in 1955 the first computer based on magnetic storage.[15]
George Stibitz is internationally recognized as a father of the modern digital computer. While
working at Bell Labs in November 1937, Stibitz invented and built a relay-based calculator he
dubbed the "Model K" (for "kitchen table", on which he had assembled it), which was the first
to use binary circuits to perform an arithmetic operation. Later models added greater
sophistication including complex arithmetic and programmability.[16]
A succession of steadily more powerful and flexible computing devices were constructed in
the 1930s and 1940s, gradually adding the key features that are seen in modern computers.
The use of digital electronics (largely invented by Claude Shannon in 1937) and more flexible
programmability were vitally important steps, but defining one point along this road as "the
first digital electronic computer" is difficult.Shannon 1940
Notable achievements include.
Konrad Zuse's electromechanical "Z machines". The Z3 (1941) was the first working
machine featuring binary arithmetic, including floating point arithmetic and a measure
of programmability. In 1998 the Z3 was proved to be Turing complete, therefore being
the world's first operational computer.[17]
The non-programmable Atanasoff–Berry Computer (commenced in 1937, completed
in 1941) which used vacuum tube based computation, binary numbers, and
regenerative capacitor memory. The use of regenerative memory allowed it to be
much more compact than its peers (being approximately the size of a large desk or
workbench), since intermediate results could be stored and then fed back into the same
set of computation elements.
The secret British Colossus computers (1943),[18]
which had limited programmability
but demonstrated that a device using thousands of tubes could be reasonably reliable
and electronically reprogrammable. It was used for breaking German wartime codes.
The Harvard Mark I (1944), a large-scale electromechanical computer with limited
programmability.[19]
The U.S. Army's Ballistic Research Laboratory ENIAC (1946), which used decimal
arithmetic and is sometimes called the first general purpose electronic computer (since
Konrad Zuse's Z3 of 1941 used electromagnets instead of electronics). Initially,
however, ENIAC had an inflexible architecture which essentially required rewiring to
change its programming.
Stored-program architecture
Several developers of ENIAC, recognizing its flaws, came up with a far more flexible and
elegant design, which came to be known as the "stored program architecture" or von
Neumann architecture. This design was first formally described by John von Neumann in the
paper First Draft of a Report on the EDVAC, distributed in 1945. A number of projects to
develop computers based on the stored-program architecture commenced around this time, the
first of these being completed in Great Britain. The first working prototype to be
demonstrated was the Manchester Small-Scale Experimental Machine (SSEM or "Baby") in
1948. The Electronic Delay Storage Automatic Calculator (EDSAC), completed a year after
the SSEM at Cambridge University, was the first practical, non-experimental implementation
of the stored program design and was put to use immediately for research work at the
university. Shortly thereafter, the machine originally described by von Neumann's paper—
EDVAC—was completed but did not see full-time use for an additional two years.
Nearly all modern computers implement some form of the stored-program architecture,
making it the single trait by which the word "computer" is now defined. While the
technologies used in computers have changed dramatically since the first electronic, general-
purpose computers of the 1940s, most still use the von Neumann architecture.
Beginning in the 1950s, Soviet scientists Sergei Sobolev and Nikolay Brusentsov conducted
research on ternary computers, devices that operated on a base three numbering system of −1,
0, and 1 rather than the conventional binary numbering system upon which most computers
are based. They designed the Setun, a functional ternary computer, at Moscow State
University. The device was put into limited production in the Soviet Union, but supplanted by
the more common binary architecture.
Semiconductors and microprocessors
Computers using vacuum tubes as their electronic elements were in use throughout the 1950s,
but by the 1960s had been largely replaced by transistor-based machines, which were smaller,
faster, cheaper to produce, required less power, and were more reliable. The first
transistorised computer was demonstrated at the University of Manchester in 1953.[20]
In the
1970s, integrated circuit technology and the subsequent creation of microprocessors, such as
the Intel 4004, further decreased size and cost and further increased speed and reliability of
computers. By the late 1970s, many products such as video recorders contained dedicated
computers called microcontrollers, and they started to appear as a replacement to mechanical
controls in domestic appliances such as washing machines. The 1980s witnessed home
computers and the now ubiquitous personal computer. With the evolution of the Internet,
personal computers are becoming as common as the television and the telephone in the
household[citation needed]
.
Modern smartphones are fully programmable computers in their own right, and as of 2009
may well be the most common form of such computers in existence[citation needed]
.
Programs
The defining feature of modern computers which distinguishes them from all other machines
is that they can be programmed. That is to say that some type of instructions (the program)
can be given to the computer, and it will carry process them. While some computers may have
strange concepts "instructions" and "output" (see quantum computing), modern computers
based on the von Neumann architecture are often have machine code in the form of an
imperative programming language.
In practical terms, a computer program may be just a few instructions or extend to many
millions of instructions, as do the programs for word processors and web browsers for
example. A typical modern computer can execute billions of instructions per second
(gigaflops) and rarely makes a mistake over many years of operation. Large computer
programs consisting of several million instructions may take teams of programmers years to
write, and due to the complexity of the task almost certainly contain errors.
Stored program architecture
Main articles: Computer program and Computer programming
A 1970s punched card containing one line from a FORTRAN program. The card reads: "Z(1)
= Y + W(1)" and is labelled "PROJ039" for identification purposes.
This section applies to most common RAM machine-based computers.
In most cases, computer instructions are simple: add one number to another, move some data
from one location to another, send a message to some external device, etc. These instructions
are read from the computer's memory and are generally carried out (executed) in the order
they were given. However, there are usually specialized instructions to tell the computer to
jump ahead or backwards to some other place in the program and to carry on executing from
there. These are called "jump" instructions (or branches). Furthermore, jump instructions may
be made to happen conditionally so that different sequences of instructions may be used
depending on the result of some previous calculation or some external event. Many computers
directly support subroutines by providing a type of jump that "remembers" the location it
jumped from and another instruction to return to the instruction following that jump
instruction.
Program execution might be likened to reading a book. While a person will normally read
each word and line in sequence, they may at times jump back to an earlier place in the text or
skip sections that are not of interest. Similarly, a computer may sometimes go back and repeat
the instructions in some section of the program over and over again until some internal
condition is met. This is called the flow of control within the program and it is what allows
the computer to perform tasks repeatedly without human intervention.
Comparatively, a person using a pocket calculator can perform a basic arithmetic operation
such as adding two numbers with just a few button presses. But to add together all of the
numbers from 1 to 1,000 would take thousands of button presses and a lot of time—with a
near certainty of making a mistake. On the other hand, a computer may be programmed to do
this with just a few simple instructions. For example:
mov #0, sum ; set sum to 0
mov #1, num ; set num to 1
loop: add num, sum ; add num to sum
add #1, num ; add 1 to num
cmp num, #1000 ; compare num to 1000
ble loop ; if num <= 1000, go back to 'loop'
halt ; end of program. stop running
Once told to run this program, the computer will perform the repetitive addition task without
further human intervention. It will almost never make a mistake and a modern PC can
complete the task in about a millionth of a second.[21]
Bugs
Errors in computer programs are called "bugs". Bugs may be benign and not affect the
usefulness of the program, or have only subtle effects. But in some cases they may cause the
program to "hang"—become unresponsive to input such as mouse clicks or keystrokes, or to
completely fail or "crash". Otherwise benign bugs may sometimes be harnessed for malicious
intent by an unscrupulous user writing an "exploit"—code designed to take advantage of a
bug and disrupt a computer's proper execution. Bugs are usually not the fault of the computer.
Since computers merely execute the instructions they are given, bugs are nearly always the
result of programmer error or an oversight made in the program's design.[22]
Machine code
In most computers, individual instructions are stored as machine code with each instruction
being given a unique number (its operation code or opcode for short). The command to add
two numbers together would have one opcode, the command to multiply them would have a
different opcode and so on. The simplest computers are able to perform any of a handful of
different instructions; the more complex computers have several hundred to choose from—
each with a unique numerical code. Since the computer's memory is able to store numbers, it
can also store the instruction codes. This leads to the important fact that entire programs
(which are just lists of these instructions) can be represented as lists of numbers and can
themselves be manipulated inside the computer in the same way as numeric data. The
fundamental concept of storing programs in the computer's memory alongside the data they
operate on is the crux of the von Neumann, or stored program, architecture. In some cases, a
computer might store some or all of its program in memory that is kept separate from the data
it operates on. This is called the Harvard architecture after the Harvard Mark I computer.
Modern von Neumann computers display some traits of the Harvard architecture in their
designs, such as in CPU caches.
While it is possible to write computer programs as long lists of numbers (machine language)
and while this technique was used with many early computers,[23]
it is extremely tedious and
potentially error-prone to do so in practice, especially for complicated programs. Instead, each
basic instruction can be given a short name that is indicative of its function and easy to
remember—a mnemonic such as ADD, SUB, MULT or JUMP. These mnemonics are
collectively known as a computer's assembly language. Converting programs written in
assembly language into something the computer can actually understand (machine language)
is usually done by a computer program called an assembler. Machine languages and the
assembly languages that represent them (collectively termed low-level programming
languages) tend to be unique to a particular type of computer. For instance, an ARM
architecture computer (such as may be found in a PDA or a hand-held videogame) cannot
understand the machine language of an Intel Pentium or the AMD Athlon 64 computer that
might be in a PC.[24]
Higher-level languages and program design
Though considerably easier than in machine language, writing long programs in assembly
language is often difficult and is also error prone. Therefore, most practical programs are
written in more abstract high-level programming languages that are able to express the needs
of the programmer more conveniently (and thereby help reduce programmer error). High level
languages are usually "compiled" into machine language (or sometimes into assembly
language and then into machine language) using another computer program called a
compiler.[25]
High level languages are less related to the workings of the target computer than
assembly language, and more related to the language and structure of the problem(s) to be
solved by the final program. It is therefore often possible to use different compilers to
translate the same high level language program into the machine language of many different
types of computer. This is part of the means by which software like video games may be
made available for different computer architectures such as personal computers and various
video game consoles.
The task of developing large software systems presents a significant intellectual challenge.
Producing software with an acceptably high reliability within a predictable schedule and
budget has historically been difficult; the academic and professional discipline of software
engineering concentrates specifically on this challenge.
Function
Main articles: Central processing unit and Microprocessor
A general purpose computer has four main components: the arithmetic logic unit (ALU), the
control unit, the memory, and the input and output devices (collectively termed I/O). These
parts are interconnected by busses, often made of groups of wires.
Inside each of these parts are thousands to trillions of small electrical circuits which can be
turned off or on by means of an electronic switch. Each circuit represents a bit (binary digit)
of information so that when the circuit is on it represents a "1", and when off it represents a
"0" (in positive logic representation). The circuits are arranged in logic gates so that one or
more of the circuits may control the state of one or more of the other circuits.
The control unit, ALU, registers, and basic I/O (and often other hardware closely linked with
these) are collectively known as a central processing unit (CPU). Early CPUs were composed
of many separate components but since the mid-1970s CPUs have typically been constructed
on a single integrated circuit called a microprocessor.
Control unit
Main articles: CPU design and Control unit
Diagram showing how a particular MIPS architecture instruction would be decoded by the
control system.
The control unit (often called a control system or central controller) manages the computer's
various components; it reads and interprets (decodes) the program instructions, transforming
them into a series of control signals which activate other parts of the computer.[26]
Control
systems in advanced computers may change the order of some instructions so as to improve
performance.
A key component common to all CPUs is the program counter, a special memory cell (a
register) that keeps track of which location in memory the next instruction is to be read
from.[27]
The control system's function is as follows—note that this is a simplified description, and
some of these steps may be performed concurrently or in a different order depending on the
type of CPU:
1. Read the code for the next instruction from the cell indicated by the program counter.
2. Decode the numerical code for the instruction into a set of commands or signals for
each of the other systems.
3. Increment the program counter so it points to the next instruction.
4. Read whatever data the instruction requires from cells in memory (or perhaps from an
input device). The location of this required data is typically stored within the
instruction code.
5. Provide the necessary data to an ALU or register.
6. If the instruction requires an ALU or specialized hardware to complete, instruct the
hardware to perform the requested operation.
7. Write the result from the ALU back to a memory location or to a register or perhaps an
output device.
8. Jump back to step (1).
Since the program counter is (conceptually) just another set of memory cells, it can be
changed by calculations done in the ALU. Adding 100 to the program counter would cause
the next instruction to be read from a place 100 locations further down the program.
Instructions that modify the program counter are often known as "jumps" and allow for loops
(instructions that are repeated by the computer) and often conditional instruction execution
(both examples of control flow).
It is noticeable that the sequence of operations that the control unit goes through to process an
instruction is in itself like a short computer program—and indeed, in some more complex
CPU designs, there is another yet smaller computer called a microsequencer that runs a
microcode program that causes all of these events to happen.
Arithmetic/logic unit (ALU)
Main article: Arithmetic logic unit
The ALU is capable of performing two classes of operations: arithmetic and logic.[28]
The set of arithmetic operations that a particular ALU supports may be limited to adding and
subtracting or might include multiplying or dividing, trigonometry functions (sine, cosine,
etc.) and square roots. Some can only operate on whole numbers (integers) whilst others use
floating point to represent real numbers—albeit with limited precision. However, any
computer that is capable of performing just the simplest operations can be programmed to
break down the more complex operations into simple steps that it can perform. Therefore, any
computer can be programmed to perform any arithmetic operation—although it will take
more time to do so if its ALU does not directly support the operation. An ALU may also
compare numbers and return boolean truth values (true or false) depending on whether one is
equal to, greater than or less than the other ("is 64 greater than 65?").
Logic operations involve Boolean logic: AND, OR, XOR and NOT. These can be useful both
for creating complicated conditional statements and processing boolean logic.
Superscalar computers may contain multiple ALUs so that they can process several
instructions at the same time.[29]
Graphics processors and computers with SIMD and MIMD
features often provide ALUs that can perform arithmetic on vectors and matrices.
Memory
Main article: Computer data storage
Magnetic core memory was the computer memory of choice throughout the 1960s, until it
was replaced by semiconductor memory.
A computer's memory can be viewed as a list of cells into which numbers can be placed or
read. Each cell has a numbered "address" and can store a single number. The computer can be
instructed to "put the number 123 into the cell numbered 1357" or to "add the number that is
in cell 1357 to the number that is in cell 2468 and put the answer into cell 1595". The
information stored in memory may represent practically anything. Letters, numbers, even
computer instructions can be placed into memory with equal ease. Since the CPU does not
differentiate between different types of information, it is the software's responsibility to give
significance to what the memory sees as nothing but a series of numbers.
In almost all modern computers, each memory cell is set up to store binary numbers in groups
of eight bits (called a byte). Each byte is able to represent 256 different numbers (2^8 = 256);
either from 0 to 255 or −128 to +127. To store larger numbers, several consecutive bytes may
be used (typically, two, four or eight). When negative numbers are required, they are usually
stored in two's complement notation. Other arrangements are possible, but are usually not
seen outside of specialized applications or historical contexts. A computer can store any kind
of information in memory if it can be represented numerically. Modern computers have
billions or even trillions of bytes of memory.
The CPU contains a special set of memory cells called registers that can be read and written to
much more rapidly than the main memory area. There are typically between two and one
hundred registers depending on the type of CPU. Registers are used for the most frequently
needed data items to avoid having to access main memory every time data is needed. As data
is constantly being worked on, reducing the need to access main memory (which is often slow
compared to the ALU and control units) greatly increases the computer's speed.
Computer main memory comes in two principal varieties: random-access memory or RAM
and read-only memory or ROM. RAM can be read and written to anytime the CPU
commands it, but ROM is pre-loaded with data and software that never changes, so the CPU
can only read from it. ROM is typically used to store the computer's initial start-up
instructions. In general, the contents of RAM are erased when the power to the computer is
turned off, but ROM retains its data indefinitely. In a PC, the ROM contains a specialized
program called the BIOS that orchestrates loading the computer's operating system from the
hard disk drive into RAM whenever the computer is turned on or reset. In embedded
computers, which frequently do not have disk drives, all of the required software may be
stored in ROM. Software stored in ROM is often called firmware, because it is notionally
more like hardware than software. Flash memory blurs the distinction between ROM and
RAM, as it retains its data when turned off but is also rewritable. It is typically much slower
than conventional ROM and RAM however, so its use is restricted to applications where high
speed is unnecessary.[30]
In more sophisticated computers there may be one or more RAM cache memories which are
slower than registers but faster than main memory. Generally computers with this sort of
cache are designed to move frequently needed data into the cache automatically, often without
the need for any intervention on the programmer's part.
Input/output (I/O)
Main article: Input/output
Hard disk drives are common storage devices used with computers.
I/O is the means by which a computer exchanges information with the outside world.[31]
Devices that provide input or output to the computer are called peripherals.[32]
On a typical
personal computer, peripherals include input devices like the keyboard and mouse, and output
devices such as the display and printer. Hard disk drives, floppy disk drives and optical disc
drives serve as both input and output devices. Computer networking is another form of I/O.
Often, I/O devices are complex computers in their own right with their own CPU and
memory. A graphics processing unit might contain fifty or more tiny computers that perform
the calculations necessary to display 3D graphics[citation needed]
. Modern desktop computers
contain many smaller computers that assist the main CPU in performing I/O.
Multitasking
Main article: Computer multitasking
While a computer may be viewed as running one gigantic program stored in its main memory,
in some systems it is necessary to give the appearance of running several programs
simultaneously. This is achieved by multitasking i.e. having the computer switch rapidly
between running each program in turn.[33]
One means by which this is done is with a special signal called an interrupt which can
periodically cause the computer to stop executing instructions where it was and do something
else instead. By remembering where it was executing prior to the interrupt, the computer can
return to that task later. If several programs are running "at the same time", then the interrupt
generator might be causing several hundred interrupts per second, causing a program switch
each time. Since modern computers typically execute instructions several orders of magnitude
faster than human perception, it may appear that many programs are running at the same time
even though only one is ever executing in any given instant. This method of multitasking is
sometimes termed "time-sharing" since each program is allocated a "slice" of time in turn.[34]
Before the era of cheap computers, the principal use for multitasking was to allow many
people to share the same computer.
Seemingly, multitasking would cause a computer that is switching between several programs
to run more slowly — in direct proportion to the number of programs it is running. However,
most programs spend much of their time waiting for slow input/output devices to complete
their tasks. If a program is waiting for the user to click on the mouse or press a key on the
keyboard, then it will not take a "time slice" until the event it is waiting for has occurred. This
frees up time for other programs to execute so that many programs may be run at the same
time without unacceptable speed loss.
Multiprocessing
Main article: Multiprocessing
Cray designed many supercomputers that used multiprocessing heavily.
Some computers are designed to distribute their work across several CPUs in a
multiprocessing configuration, a technique once employed only in large and powerful
machines such as supercomputers, mainframe computers and servers. Multiprocessor and
multi-core (multiple CPUs on a single integrated circuit) personal and laptop computers are
now widely available, and are being increasingly used in lower-end markets as a result.
Supercomputers in particular often have highly unique architectures that differ significantly
from the basic stored-program architecture and from general purpose computers.[35]
They
often feature thousands of CPUs, customized high-speed interconnects, and specialized
computing hardware. Such designs tend to be useful only for specialized tasks due to the large
scale of program organization required to successfully utilize most of the available resources
at once. Supercomputers usually see usage in large-scale simulation, graphics rendering, and
cryptography applications, as well as with other so-called "embarrassingly parallel" tasks.
Networking and the Internet
Main articles: Computer networking and Internet
Visualization of a portion of the routes on the Internet.
Computers have been used to coordinate information between multiple locations since the
1950s. The U.S. military's SAGE system was the first large-scale example of such a system,
which led to a number of special-purpose commercial systems like Sabre.[36]
In the 1970s, computer engineers at research institutions throughout the United States began
to link their computers together using telecommunications technology. This effort was funded
by ARPA (now DARPA), and the computer network that it produced was called the
ARPANET.[37]
The technologies that made the Arpanet possible spread and evolved.
In time, the network spread beyond academic and military institutions and became known as
the Internet. The emergence of networking involved a redefinition of the nature and
boundaries of the computer. Computer operating systems and applications were modified to
include the ability to define and access the resources of other computers on the network, such
as peripheral devices, stored information, and the like, as extensions of the resources of an
individual computer. Initially these facilities were available primarily to people working in
high-tech environments, but in the 1990s the spread of applications like e-mail and the World
Wide Web, combined with the development of cheap, fast networking technologies like
Ethernet and ADSL saw computer networking become almost ubiquitous. In fact, the number
of computers that are networked is growing phenomenally. A very large proportion of
personal computers regularly connect to the Internet to communicate and receive information.
"Wireless" networking, often utilizing mobile phone networks, has meant networking is
becoming increasingly ubiquitous even in mobile computing environments.
Misconceptions
A computer does not need to be electric, nor even have a processor, nor RAM, nor even hard
disk. The minimal definition of a computer is anything that transforms information in a
purposeful way.
Required technology
Main article: Unconventional computing
Computational systems as flexible as a personal computer can be built out of almost anything.
For example, a computer can be made out of billiard balls (billiard ball computer); this is an
unintuitive and pedagogical example that a computer can be made out of almost anything.
More realistically, modern computers are made out of transistors made of photolithographed
semiconductors.
Historically, computers evolved from mechanical computers and eventually from vacuum
tubes to transistors.
There is active research to make computers out of many promising new types of technology,
such as optical computing, DNA computers, neural computers, and quantum computers. Some
of these can easily tackle problems that modern computers cannot (such as how quantum
computers can break some modern encryption algorithms by quantum factoring).
Computer architecture paradigms
Some different paradigms of how to build a computer from the ground-up:
RAM machines
These are the types of computers with a CPU, computer memory, etc., which
understand basic instructions in a machine language. The concept evolved from the
Turing machine.
Brains
Brains are massively parallel processors made of neurons, wired in intricate patterns,
that communicate via electricity and neurotransmitter chemicals.
Programming languages
Such as the lambda calculus, or modern programming languages, are virtual
computers built on top of other computers.
Cellular automata
For example, the game of Life can create "gliders" and "loops" and other constructs
that transmit information; this paradigm can be applied to DNA computing, chemical
computing, etc.
Groups and committees
The linking of multiple computers (brains) is itself a computer
Logic gates are a common abstraction which can apply to most of the above digital or analog
paradigms.
The ability to store and execute lists of instructions called programs makes computers
extremely versatile, distinguishing them from calculators. The Church–Turing thesis is a
mathematical statement of this versatility: any computer with a minimum capability (being
Turing-complete) is, in principle, capable of performing the same tasks that any other
computer can perform. Therefore any type of computer (netbook, supercomputer, cellular
automaton, etc.) is able to perform the same computational tasks, given enough time and
storage capacity.
Limited-function computers
Conversely, a computer which is limited in function (one that is not "Turing-complete")
cannot simulate arbitrary things. For example, simple four-function calculators cannot
simulate a real computer without human intervention. As a more complicated example,
without the ability to program a gaming console, it can never accomplish what a
programmable calculator from the 1990s could (given enough time); the system as a whole is
not Turing-complete, even though it contains a Turing-complete component (the
microprocessor). Living organisms (the body, not the brain) are also limited-function
computers designed to make copies of themselves; they cannot be reprogrammed without
genetic engineering.
Virtual computers
A "computer" is commonly considered to be a physical device. However, one can create a
computer program which describes how to run a different computer, i.e. "simulating a
computer in a computer". Not only is this a constructive proof of the Church-Turing thesis,
but is also extremely common in all modern computers. For example, some programming
languages use something called an interpreter, which is a simulated computer built on top of
the basic computer; this allows programmers to write code (computer input) in a different
language than the one understood by the base computer (the alternative is to use a compiler).
Additionally, virtual machines are simulated computers which virtually replicate a physical
computer in software, and are very commonly used by IT. Virtual machines are also a
common technique used to create emulators, such game console emulators.
Further topics
Glossary of computers
Artificial intelligence
A computer will solve problems in exactly the way they are programmed to, without regard to
efficiency nor alternative solutions nor possible shortcuts nor possible errors in the code.
Computer programs which learn and adapt are part of the emerging field of artificial
intelligence and machine learning.
Hardware
The term hardware covers all of those parts of a computer that are tangible objects. Circuits,
displays, power supplies, cables, keyboards, printers and mice are all hardware.
History of computing hardware
First Generation
(Mechanical/Electromechanical)
Calculators
Antikythera mechanism,
Difference engine, Norden
bombsight
Programmable Devices Jacquard loom, Analytical
engine, Harvard Mark I, Z3
Second Generation (Vacuum Tubes)
Calculators
Atanasoff–Berry Computer,
IBM 604, UNIVAC 60,
UNIVAC 120
Programmable Devices
Colossus, ENIAC,
Manchester Small-Scale
Experimental Machine,
EDSAC, Manchester Mark 1,
Ferranti Pegasus, Ferranti
Mercury, CSIRAC, EDVAC,
UNIVAC I, IBM 701, IBM
702, IBM 650, Z22
Third Generation (Discrete
transistors and SSI, MSI, LSI
Integrated circuits)
Mainframes
IBM 7090, IBM 7080, IBM
System/360, BUNCH
Minicomputer
PDP-8, PDP-11, IBM
System/32, IBM System/36
Fourth Generation (VLSI integrated
circuits)
Minicomputer VAX, IBM System i
4-bit microcomputer Intel 4004, Intel 4040
8-bit microcomputer
Intel 8008, Intel 8080,
Motorola 6800, Motorola
6809, MOS Technology 6502,
Zilog Z80
16-bit microcomputer Intel 8088, Zilog Z8000,
WDC 65816/65802
32-bit microcomputer
Intel 80386, Pentium,
Motorola 68000, ARM
architecture
64-bit microcomputer[38]
Alpha, MIPS, PA-RISC,
PowerPC, SPARC, x86-64
Embedded computer Intel 8048, Intel 8051
Personal computer
Desktop computer, Home
computer, Laptop computer,
Personal digital assistant
(PDA), Portable computer,
Tablet PC, Wearable
computer
Theoretical/experimental
Quantum computer,
Chemical computer,
DNA computing, Optical
computer, Spintronics
based computer
Other Hardware Topics
Peripheral device
(Input/output)
Input Mouse, Keyboard, Joystick, Image scanner,
Webcam, Graphics tablet, Microphone
Output Monitor, Printer, Loudspeaker
Both Floppy disk drive, Hard disk drive, Optical
disc drive, Teleprinter
Computer busses
Short range RS-232, SCSI, PCI, USB
Long range (Computer
networking) Ethernet, ATM, FDDI
Software
Main article: Computer software
Software refers to parts of the computer which do not have a material form, such as
programs, data, protocols, etc. When software is stored in hardware that cannot easily be
modified (such as BIOS ROM in an IBM PC compatible), it is sometimes called "firmware"
to indicate that it falls into an uncertain area somewhere between hardware and software.
Computer software
Operating
system
Unix and BSD UNIX System V, IBM AIX, HP-UX, Solaris (SunOS),
IRIX, List of BSD operating systems
GNU/Linux List of Linux distributions, Comparison of Linux
distributions
Microsoft
Windows
Windows 95, Windows 98, Windows NT, Windows 2000,
Windows XP, Windows Vista, Windows 7
DOS 86-DOS (QDOS), PC-DOS, MS-DOS, DR-DOS, FreeDOS
Mac OS Mac OS classic, Mac OS X
Embedded and
real-time List of embedded operating systems
Experimental Amoeba, Oberon/Bluebottle, Plan 9 from Bell Labs
Library
Multimedia DirectX, OpenGL, OpenAL
Programming
library C standard library, Standard Template Library
Data
Protocol TCP/IP, Kermit, FTP, HTTP, SMTP
File format HTML, XML, JPEG, MPEG, PNG
User
interface
Graphical user
interface (WIMP)
Microsoft Windows, GNOME, KDE, QNX Photon, CDE,
GEM, Aqua
Text-based user
interface
Command-line interface, Text user interface
Application
Office suite
Word processing, Desktop publishing, Presentation
program, Database management system, Scheduling &
Time management, Spreadsheet, Accounting software
Internet Access Browser, E-mail client, Web server, Mail transfer agent,
Instant messaging
Design and
manufacturing
Computer-aided design, Computer-aided manufacturing,
Plant management, Robotic manufacturing, Supply chain
management
Graphics
Raster graphics editor, Vector graphics editor, 3D modeler,
Animation editor, 3D computer graphics, Video editing,
Image processing
Audio
Digital audio editor, Audio playback, Mixing, Audio
synthesis, Computer music
Software
engineering
Compiler, Assembler, Interpreter, Debugger, Text editor,
Integrated development environment, Software performance
analysis, Revision control, Software configuration
management
Educational Edutainment, Educational game, Serious game, Flight
simulator
Games
Strategy, Arcade, Puzzle, Simulation, First-person shooter,
Platform, Massively multiplayer, Interactive fiction
Misc Artificial intelligence, Antivirus software, Malware scanner,
Installer/Package management systems, File manager
Programming languages
Main article: Programming language
Programming languages provide various ways of specifying programs for computers to run.
Unlike natural languages, programming languages are designed to permit no ambiguity and to
be concise. They are purely written languages and are often difficult to read aloud. They are
generally either translated into machine code by a compiler or an assembler before being run,
or translated directly at run time by an interpreter. Sometimes programs are executed by a
hybrid method of the two techniques. There are thousands of different programming
languages—some intended to be general purpose, others useful only for highly specialized
applications.
Programming languages
Lists of programming
languages
Timeline of programming languages, List of programming
languages by category, Generational list of programming languages,
List of programming languages, Non-English-based programming
languages
Commonly used
Assembly languages ARM, MIPS, x86
Commonly used high-
level programming
languages
Ada, BASIC, C, C++, C#, COBOL, Fortran, Java, Lisp, Pascal,
Object Pascal
Commonly used
Scripting languages Bourne script, JavaScript, Python, Ruby, PHP, Perl
Professions and organizations
As the use of computers has spread throughout society, there are an increasing number of
careers involving computers.
Computer-related professions
Hardware-
related
Electrical engineering, Electronic engineering, Computer engineering,
Telecommunications engineering, Optical engineering, Nanoengineering
Software-
related
Computer science, Desktop publishing, Human–computer interaction,
Information technology, Information systems, Computational science, Software
engineering, Video game industry, Web design
The need for computers to work well together and to be able to exchange information has
spawned the need for many standards organizations, clubs and societies of both a formal and
informal nature.
Organizations
Standards groups ANSI, IEC, IEEE, IETF, ISO, W3C
Professional Societies ACM, AIS, IET, IFIP, BCS
Free/Open source software
groups
Free Software Foundation, Mozilla Foundation, Apache
Software Foundation
See also
Information technology portal
Computability theory
Computer security
Computer insecurity
List of computer term etymologies
List of fictional computers
Pulse computation
Notes
1. ^ In 1946, ENIAC required an estimated 174 kW. By comparison, a modern laptop computer
may use around 30 W; nearly six thousand times less. "Approximate Desktop & Notebook
Power Usage". University of Pennsylvania.
http://www.upenn.edu/computing/provider/docs/hardware/powerusage.html. Retrieved 2009-
06-20.
2. ^ Early computers such as Colossus and ENIAC were able to process between 5 and 100
operations per second. A modern "commodity" microprocessor (as of 2007) can process
billions of operations per second, and many of these operations are more complicated and
useful than early computer operations. "Intel Core2 Duo Mobile Processor: Features". Intel
Corporation. http://www.intel.com/cd/channel/reseller/asmo-
na/eng/products/mobile/processors/core2duo_m/feature/index.htm. Retrieved 2009-06-20.
3. ^ computer, n.. Oxford English Dictionary (2 ed.). Oxford University Press. 1989.
http://dictionary.oed.com/. Retrieved 2009-04-10
4. ^ "Discovering How Greeks Computed in 100 B.C.". The New York Times. 31 July 2008.
http://www.nytimes.com/2008/07/31/science/31computer.html?hp. Retrieved 27 March 2010.
5. ^ "Heron of Alexandria". http://www.mlahanas.de/Greeks/HeronAlexandria2.htm. Retrieved
2008-01-15.
6. ^ a b Ancient Discoveries, Episode 11: Ancient Robots. History Channel.
http://www.youtube.com/watch?v=rxjbaQl0ad8. Retrieved 2008-09-06
7. ^ Howard R. Turner (1997), Science in Medieval Islam: An Illustrated Introduction, p. 184,
University of Texas Press, ISBN 0-292-78149-0
8. ^ Donald Routledge Hill, "Mechanical Engineering in the Medieval Near East", Scientific
American, May 1991, pp. 64–9 (cf. Donald Routledge Hill, Mechanical Engineering)
9. ^ From cave paintings to the internet HistoryofScience.com
10. ^ See: Anthony Hyman, ed., Science and Reform: Selected Works of Charles Babbage
(Cambridge, England: Cambridge University Press, 1989), page 298. It is in the collection of
the Science Museum in London, England. (Delve (2007), page 99.)
11. ^ The analytical engine should not be confused with Babbage's difference engine which was a
non-programmable mechanical calculator.
12. ^ "Columbia University Computing History: Herman Hollerith". Columbia.edu.
http://www.columbia.edu/acis/history/hollerith.html. Retrieved 2010-12-11.
13. ^ a b "Alan Turing – Time 100 People of the Century". Time Magazine.
http://205.188.238.181/time/time100/scientist/profile/turing.html. Retrieved 2009-06-13. "The
fact remains that everyone who taps at a keyboard, opening a spreadsheet or a word-
processing program, is working on an incarnation of a Turing machine"
14. ^ "Atanasoff-Berry Computer".
http://energysciencenews.com/phpBB3/viewtopic.php?f=1&t=98&p=264#p264. Retrieved
2010-11-20.
15. ^ "Spiegel: The inventor of the computer's biography was published". Spiegel.de. 2009-09-28.
http://www.spiegel.de/netzwelt/gadgets/0,1518,651776,00.html. Retrieved 2010-12-11.
16. ^ "Inventor Profile: George R. Stibitz". National Inventors Hall of Fame Foundation, Inc..
http://www.invent.org/hall_of_fame/140.html.
17. ^ Rojas, R. (1998). "How to make Zuse's Z3 a universal computer". IEEE Annals of the
History of Computing 20 (3): 51–54. doi:10.1109/85.707574.
18. ^ B. Jack Copeland, ed., Colossus: The Secrets of Bletchley Park's Codebreaking Computers,
Oxford University Press, 2006
19. ^ ""Robot Mathematician Knows All The Answers", October 1944, Popular Science".
Books.google.com.
http://books.google.com/books?id=PyEDAAAAMBAJ&pg=PA86&dq=motor+gun+boat&hl=
en&ei=LxTqTMfGI4-
bnwfEyNiWDQ&sa=X&oi=book_result&ct=result&resnum=6&ved=0CEIQ6AEwBQ#v=one
page&q=motor%20gun%20boat&f=true. Retrieved 2010-12-11.
20. ^ Lavington 1998, p. 37
21. ^ This program was written similarly to those for the PDP-11 minicomputer and shows some
typical things a computer can do. All the text after the semicolons are comments for the
benefit of human readers. These have no significance to the computer and are ignored. (Digital
Equipment Corporation 1972)
22. ^ It is not universally true that bugs are solely due to programmer oversight. Computer
hardware may fail or may itself have a fundamental problem that produces unexpected results
in certain situations. For instance, the Pentium FDIV bug caused some Intel microprocessors
in the early 1990s to produce inaccurate results for certain floating point division operations.
This was caused by a flaw in the microprocessor design and resulted in a partial recall of the
affected devices.
23. ^ Even some later computers were commonly programmed directly in machine code. Some
minicomputers like the DEC PDP-8 could be programmed directly from a panel of switches.
However, this method was usually used only as part of the booting process. Most modern
computers boot entirely automatically by reading a boot program from some non-volatile
memory.
24. ^ However, there is sometimes some form of machine language compatibility between
different computers. An x86-64 compatible microprocessor like the AMD Athlon 64 is able to
run most of the same programs that an Intel Core 2 microprocessor can, as well as programs
designed for earlier microprocessors like the Intel Pentiums and Intel 80486. This contrasts
with very early commercial computers, which were often one-of-a-kind and totally
incompatible with other computers.
25. ^ High level languages are also often interpreted rather than compiled. Interpreted languages
are translated into machine code on the fly, while running, by another program called an
interpreter.
26. ^ The control unit's role in interpreting instructions has varied somewhat in the past. Although
the control unit is solely responsible for instruction interpretation in most modern computers,
this is not always the case. Many computers include some instructions that may only be
partially interpreted by the control system and partially interpreted by another device. This is
especially the case with specialized computing hardware that may be partially self-contained.
For example, EDVAC, one of the earliest stored-program computers, used a central control
unit that only interpreted four instructions. All of the arithmetic-related instructions were
passed on to its arithmetic unit and further decoded there.
27. ^ Instructions often occupy more than one memory address, so the program counters usually
increases by the number of memory locations required to store one instruction.
28. ^ David J. Eck (2000). The Most Complex Machine: A Survey of Computers and Computing.
A K Peters, Ltd.. p. 54. ISBN 9781568811284.
29. ^ Erricos John Kontoghiorghes (2006). Handbook of Parallel Computing and Statistics. CRC
Press. p. 45. ISBN 9780824740672.
30. ^ Flash memory also may only be rewritten a limited number of times before wearing out,
making it less useful for heavy random access usage. (Verma & Mielke 1988)
31. ^ Donald Eadie (1968). Introduction to the Basic Computer. Prentice-Hall. p. 12.
32. ^ Arpad Barna; Dan I. Porat (1976). Introduction to Microcomputers and the
Microprocessors. Wiley. p. 85. ISBN 9780471050513.
33. ^ Jerry Peek; Grace Todino, John Strang (2002). Learning the UNIX Operating System: A
Concise Guide for the New User. O'Reilly. p. 130. ISBN 9780596002619.
34. ^ Gillian M. Davis (2002). Noise Reduction in Speech Applications. CRC Press. p. 111.
ISBN 9780849309496.
35. ^ However, it is also very common to construct supercomputers out of many pieces of cheap
commodity hardware; usually individual computers connected by networks. These so-called
computer clusters can often provide supercomputer performance at a much lower cost than
customized designs. While custom architectures are still used for most of the most powerful
supercomputers, there has been a proliferation of cluster computers in recent years. (TOP500
2006)
36. ^ Agatha C. Hughes (2000). Systems, Experts, and Computers. MIT Press. p. 161.
ISBN 9780262082853. "The experience of SAGE helped make possible the first truly large-
scale commercial real-time network: the SABRE computerized airline reservations system..."
37. ^ "A Brief History of the Internet". Internet Society.
http://www.isoc.org/internet/history/brief.shtml. Retrieved 2008-09-20.
38. ^ Most major 64-bit instruction set architectures are extensions of earlier designs. All of the
architectures listed in this table, except for Alpha, existed in 32-bit forms before their 64-bit
incarnations were introduced.
References
a Kempf, Karl (1961). Historical Monograph: Electronic Computers Within the
Ordnance Corps. Aberdeen Proving Ground (United States Army). http://ed-
thelen.org/comp-hist/U-S-Ord-61.html.
a Phillips, Tony (2000). "The Antikythera Mechanism I". American Mathematical
Society. http://www.math.sunysb.edu/~tony/whatsnew/column/antikytheraI-
0400/kyth1.html. Retrieved 2006-04-05.
a Shannon, Claude Elwood (1940). A symbolic analysis of relay and switching
circuits. Massachusetts Institute of Technology. http://hdl.handle.net/1721.1/11173.
Digital Equipment Corporation (1972) (PDF). PDP-11/40 Processor Handbook.
Maynard, MA: Digital Equipment Corporation.
http://bitsavers.vt100.net/dec/www.computer.museum.uq.edu.au_mirror/D-09-
30_PDP11-40_Processor_Handbook.pdf.
Verma, G.; Mielke, N. (1988). Reliability performance of ETOX based flash
memories. IEEE International Reliability Physics Symposium.
Meuer, Hans; Strohmaier, Erich; Simon, Horst; Dongarra, Jack (2006-11-13).
"Architectures Share Over Time". TOP500.
http://www.top500.org/lists/2006/11/overtime/Architectures. Retrieved 2006-11-27.
Lavington, Simon (1998). A History of Manchester Computers (2 ed.). Swindon: The
British Computer Society. ISBN 0902505018
Stokes, Jon (2007). Inside the Machine: An Illustrated Introduction to
Microprocessors and Computer Architecture. San Francisco: No Starch Press.
ISBN 978-1-59327-104-6.
External links
Find more about Computer on Wikipedia's sister projects:
Definitions from Wiktionary
Images and media from Commons
Learning resources from Wikiversity
News stories from Wikinews
Quotations from Wikiquote
Source texts from Wikisource
Textbooks from Wikibooks
A Brief History of Computing - slideshow by Life magazine
Retrieved from "http://en.wikipedia.org/wiki/Computer"
Categories: Computers | Computing
Hidden categories: Wikipedia indefinitely semi-protected pages | Wikipedia indefinitely
move-protected pages | All pages needing factual verification | Wikipedia articles needing
factual verification from September 2010 | All articles with unsourced statements | Articles
with unsourced statements from February 2010 | Articles with unsourced statements from
December 2007 | Use dmy dates from September 2010
Personal tools
Log in / create account
Namespaces
Article
Discussion
Variants
Views
Read
View source
View history
Actions
Search
Navigation
Main page
Contents
Featured content
Current events
Random article
Donate to Wikipedia
Interaction
Help
About Wikipedia
Community portal
Recent changes
Contact Wikipedia
Toolbox
What links here
Related changes
Upload file
Special pages
Permanent link
Cite this page
Print/export
Create a book
Download as PDF
Printable version
Languages
Acèh
Afrikaans
Alemannisch አማርኛ
Ænglisc
ية عرب ال
Aragonés ܐܪܡܝܐ
Asturianu
Azərbaycanca
Bân-lâm-gú
Башҡортса
Беларуская
Беларуская (тарашке а
Boarisch
Bosanski
Brezhoneg
Български
Català
Чӑ ашла
Cebuano
Česky
Cymraeg
Dansk
Deutsch
Diné bizaad
Eesti
Ελληνικά
Emiliàn e rumagnòl
Español
Esperanto
Euskara
سی ار ف
Føroyskt
Français
Frysk
Furlan
Gaeilge
Gaelg
Gàidhlig
Galego
贛語
������
Hak-kâ-fa
한국어
Հայերեն
Hrvatski
Ido
Igbo
/ Bahasa Indonesia
Interlingua
ᐃᓄᒃᑎᑐᑦ/inuktitut
isiXhosa
Íslenska
Italiano
עברית
Basa Jawa
Kapampangan
Къарачай-Малкъар
ქართული
Қазақша
Kernowek
Kinyarwanda
Кыргызча
Kiswahili
Kongo
Kurdî
Ladino
ລາວ
Latina
Latviešu
Lëtzebuergesch
Lietuvių
Limburgs
Lingála
Lojban
Lumbaart
Magyar
Македонски
Malagasy
Malti
صرى م
ما ر ی
Bahasa Melayu
Mirandés
Монгол
����������
Nāhuatl
Nederlands
Nedersaksisch
日本語
Nnapulitano
Нохчийн
Norsk (bokm l
Norsk (nynorsk
Occitan
Олык Марий
O'zbek
ی نجاب پ
ت ښ پ Plattdüütsch
Polski
Português
Română
Runa Simi
Русский
Русиньскый
Саха тыла
Sardu
Scots
Seeltersk
Shqip
Sicilianu
Simple English
Slovenčina
Сло ньскъ / ����������
Slovenščina
Soomaaliga
Српски / Srpski
Srpskohrvatski / Српскохр атски
Suomi
Svenska
Tagalog
Татарча/Tatarça
ไทย Тоҷикӣ
Türkçe
Türkmençe
�� ����
Українська
ارد
Vahcuengh
Vèneto
Tiếng Việt
Võro
Walon
West-Vlams
Winaray
Wolof
吴语
יי י
Yorùbá
粵語
Zazaki
Žemaitėška
中文
This page was last modified on 17 February 2011 at 17:57.
Text is available under the Creative Commons Attribution-ShareAlike License;
additional terms may apply. See Terms of Use for details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit
organization.
Contact us
Privacy policy
About Wikipedia
Disclaimers
CPU design
From Wikipedia, the free encyclopedia
Jump to: navigation, search
CPU design is the design engineering task of creating a central processing unit (CPU), a
component of computer hardware. It is a subfield of electronics engineering and computer
engineering.
Contents
1 Overview
2 Goals
3 Performance analysis and benchmarking
4 Markets
o 4.1 General purpose computing
4.1.1 High-end processor economics
o 4.2 Scientific computing
o 4.3 Embedded design
4.3.1 Embedded processor economics
4.3.2 Research and educational CPU design
4.3.3 Soft microprocessor cores
5 Micro-architectural concepts
6 Integrated heat spreader
7 Research Topics
8 References
9 See also
[edit] Overview
CPU design focuses on these areas:
1. datapaths (such as ALUs and pipelines)
2. control unit: logic which controls the datapaths
3. Memory components such as register files, caches
4. Clock circuitry such as clock drivers, PLLs, clock distribution networks
5. Pad transceiver circuitry
6. Logic gate cell library which is used to implement the logic
CPUs designed for high-performance markets might require custom designs for each of these
items to achieve frequency, power-dissipation, and chip-area goals.
CPUs designed for lower performance markets might lessen the implementation burden by:
Acquiring some of these items by purchasing them as intellectual property
Use control logic implementation techniques (logic synthesis using CAD tools) to
implement the other components - datapaths, register files, clocks
Common logic styles used in CPU design include:
Unstructured random logic
Finite-state machines
Microprogramming (common from 1965 to 1985)
Programmable logic array (common in the 1980s, no longer common)
Device types used to implement the logic include:
Transistor-transistor logic Small Scale Integration logic chips - no longer used for
CPUs
Programmable Array Logic and Programmable logic devices - no longer used for
CPUs
Emitter-coupled logic (ECL) gate arrays - no longer common
CMOS gate arrays - no longer used for CPUs
CMOS ASICs - what's commonly used today, they're so common that the term ASIC
is not used for CPUs
Field-programmable gate arrays (FPGA) - common for soft microprocessors, and
more or less required for reconfigurable computing
A CPU design project generally has these major tasks:
Programmer-visible instruction set architecture, which can be implemented by a
variety of microarchitectures
Architectural study and performance modeling in ANSI C/C++ or SystemC
High-level synthesis (HLS) or RTL (e.g. logic) implementation
RTL Verification
Circuit design of speed critical components (caches, registers, ALUs)
Logic synthesis or logic-gate-level design
Timing analysis to confirm that all logic and circuits will run at the specified operating
frequency
Physical design including floorplanning, place and route of logic gates
Checking that RTL, gate-level, transistor-level and physical-level representations are
equivalent
Checks for signal integrity, chip manufacturability
As with most complex electronic designs, the logic verification effort (proving that the design
does not have bugs) now dominates the project schedule of a CPU.
Key CPU architectural innovations include index register, cache, virtual memory, instruction
pipelining, superscalar, CISC, RISC, virtual machine, emulators, microprogram, and stack.
[edit] Goals
The first CPUs were designed to do mathematical calculations faster and more reliably than
human computers.[1]
Each successive generation of CPU might be designed to achieve some of these goals:
higher performance levels of a single program or thread
higher throughput levels of multiple programs/threads
less power consumption for the same performance level
lower cost for the same performance level
greater connectivity to build larger, more parallel systems
more specialization to aid in specific targeted markets
Re-designing a CPU core to a smaller die-area helps achieve several of these goals.
Shrinking everything (a "photomask shrink"), resulting in the same number of
transistors on a smaller die, improves performance (smaller transistors switch faster),
reduces power (smaller wires have less parasitic capacitance) and reduces cost (more
CPUs fit on the same wafer of silicon).
Releasing a CPU on the same size die, but with a smaller CPU core, keeps the cost
about the same but allows higher levels of integration within one VLSI chip
(additional cache, multiple CPUs, or other components), improving performance and
reducing overall system cost.
[edit] Performance analysis and benchmarking
Main article: Computer performance
Because there are too many programs to test a CPU's speed on all of them, benchmarks were
developed. The most famous benchmarks are the SPECint and SPECfp benchmarks
developed by Standard Performance Evaluation Corporation and the ConsumerMark
benchmark developed by the Embedded Microprocessor Benchmark Consortium EEMBC.
Some important measurements include:
Instructions per second - Most consumers pick a computer architecture (normally Intel
IA32 architecture) to be able to run a large base of pre-existing pre-compiled software.
Being relatively uninformed on computer benchmarks, some of them pick a particular
CPU based on operating frequency (see Megahertz Myth).
FLOPS - The number of floating point operations per second is often important in
selecting computers for scientific computations.
Performance per watt - System designers building parallel computers, such as Google,
pick CPUs based on their speed per watt of power, because the cost of powering the
CPU outweighs the cost of the CPU itself. [1][2]
Some system designers building parallel computers pick CPUs based on the speed per
dollar.
System designers building real-time computing systems want to guarantee worst-case
response. That is easier to do when the CPU has low interrupt latency and when it has
deterministic response. (DSP)
Computer programmers who program directly in assembly language want a CPU to
support a full featured instruction set.
Low power - For systems with limited power sources (e.g. solar, batteries, human
power).
Small size or low weight - for portable embedded systems, systems for spacecraft.
Environmental impact - Minimizing environmental impact of computers during
manufacturing and recycling as well during use. Reducing waste, reducing hazardous
materials. (see Green computing).
Some of these measures conflict. In particular, many design techniques that make a CPU run
faster make the "performance per watt", "performance per dollar", and "deterministic
response" much worse, and vice versa.
[edit] Markets
There are several different markets in which CPUs are used. Since each of these markets
differ in their requirements for CPUs, the devices designed for one market are in most cases
inappropriate for the other markets.
[edit] General purpose computing
The vast majority of revenues generated from CPU sales is for general purpose
computing[citation needed]
. That is, desktop, laptop and server computers commonly used in
businesses and homes. In this market, the Intel IA-32 architecture dominates, with its rivals
PowerPC and SPARC maintaining much smaller customer bases. Yearly, hundreds of
millions of IA-32 architecture CPUs are used by this market.
Since these devices are used to run countless different types of programs, these CPU designs
are not specifically targeted at one type of application or one function. The demands of being
able to run a wide range of programs efficiently has made these CPU designs among the more
advanced technically, along with some disadvantages of being relatively costly, and having
high power consumption.
[edit] High-end processor economics
In 1984, most high-performance CPUs required four to five years to develop.[2]
This section may require cleanup to meet Wikipedia's quality standards. Please
improve this section if you can. The talk page may contain suggestions. (December
2009)
Developing new, high-end CPUs is a very costly proposition. Both the logical complexity
(needing very large logic design and logic verification teams and simulation farms with
perhaps thousands of computers) and the high operating frequencies (needing large circuit
design teams and access to the state-of-the-art fabrication process) account for the high cost of
design for this type of chip. The design cost of a high-end CPU will be on the order of US
$100 million. Since the design of such high-end chips nominally takes about five years to
complete, to stay competitive a company has to fund at least two of these large design teams
to release products at the rate of 2.5 years per product generation.
As an example, the typical loaded cost for one computer engineer is often quoted to be
$250,000 US dollars/year. This includes salary, benefits, CAD tools, computers, office space
rent, etc. Assuming that 100 engineers are needed to design a CPU and the project takes 4
years.
Total cost = $250,000 / Engineer-Man/Year x 100 engineers x 4 years = $100,000,000 USD.
The above amount is just an example. The design teams for modern day general purpose
CPUs have several hundred team members.
[edit] Scientific computing
Main article: Supercomputer
A much smaller niche market (in revenue and units shipped) is scientific computing, used in
government research labs and universities. Previously much CPU design was done for this
market, but the cost-effectiveness of using mass markets CPUs has curtailed almost all
specialized designs for this market. The main remaining area of active hardware design and
research for scientific computing is for high-speed system interconnects.
[edit] Embedded design
As measured by units shipped, most CPUs are embedded in other machinery, such as
telephones, clocks, appliances, vehicles, and infrastructure. Embedded processors sell in the
volume of many billions of units per year, however, mostly at much lower price points than
that of the general purpose processors.
These single-function devices differ from the more familiar general-purpose CPUs in several
ways:
Low cost is of utmost importance.
It is important to maintain a low power dissipation as embedded devices often have a
limited battery life and it is often impractical to include cooling fans.
To give lower system cost, peripherals are integrated with the processor on the same
silicon chip.
Keeping peripherals on-chip also reduces power consumption as external GPIO ports
typically require buffering so that they can source or sink the relatively high current
loads that are required to maintain a strong signal outside of the chip.
o Many embedded applications have a limited amount of physical space for
circuitry; keeping peripherals on-chip will reduce the space required for the
circuit board.
o The program and data memories are often integrated on the same chip. When
the only allowed program memory is ROM, the device is known as a
microcontroller.
For many embedded applications, interrupt latency will be more critical than in some
general-purpose processors.
[edit] Embedded processor economics
As of 2009, more CPUs are produced using the ARM architecture instruction set than any
other 32-bit instruction set. The ARM architecture and the first ARM chip were designed in
about one and a half years and 5 man years of work time.[3]
The 32-bit Parallax Propeller microcontroller architecture and the first chip were designed by
two people in about 10 man years of work time.[4]
It is believed[weasel words]
that the 8-bit AVR architecture and first AVR microcontroller was
conceived and designed by two students at the Norwegian Institute of Technology.
The 8-bit 6502 architecture and the first MOS Technology 6502 chip were designed in 13
months by a group of about 9 people.[5]
[edit] Research and educational CPU design
The 32 bit Berkeley RISC I and RISC II architecture and the first chips were mostly designed
by a series of students as part of a four quarter sequence of graduate courses.[6]
This design
became the basis of the commercial SPARC processor design.
For about a decade, every student taking the 6.004 class at MIT was part of a team—each
team had one semester to design and build a simple 8 bit CPU out of 7400 series integrated
circuits. One team of 4 students designed and built a simple 32 bit CPU during that semester. [7]
Some undergraduate courses require a team of 2 to 5 students to design, implement, and test a
simple CPU in a FPGA in a single 15 week semester. [8]
[edit] Soft microprocessor cores
For embedded systems, the highest performance levels are often not needed or desired due to
the power consumption requirements. This allows for the use of processors which can be
totally implemented by logic synthesis techniques. These synthesized processors can be
implemented in a much shorter amount of time, giving quicker time-to-market.
Main article: Soft microprocessor
[edit] Micro-architectural concepts
Main article: Microarchitecture
[edit] Integrated heat spreader
IHS is usually made of copper covered with a nickel plating.
[edit] Research Topics
Main article: History of general purpose CPUs#1990 to today: looking forward
A variety of new CPU design ideas have been proposed, including reconfigurable logic,
clockless CPUs, and optical computing.
[edit] References
This article includes a list of references, related reading or external links, but its
sources remain unclear because it lacks inline citations. Please improve this article by
introducing more precise citations where appropriate. (March 2009)
1. ^ Brian Randell: The Origins of Digital Computers. Berlin: Springer 1973. ISBN 0-387-06169
2. ^ "New system manages hundreds of transactions per second" article by Robert Horst and
Sandra Metz, of Tandem Computers Inc., "Electronics" magazine, 1984 April 19: "While most
high-performance CPUs require four to five years to develop, The NonStop TXP processor
took just 2+1/2 years -- six months to develop a complete written specification, one year to
construct a working prototype, and another year to reach volume production."
3. ^ "ARM's way" 1998
4. ^ "Why the Propeller Works" by Chip Gracey
5. ^ "Interview with William Mensch"
6. ^ 'Design and Implementation of RISC I' - original journal article by C.E. Sequin and
D.A.Patterson
7. ^ "the VHS"
8. ^ "Teaching Computer Design with FPGAs" by Jan Gray
Notes
Hwang, Enoch (2006). Digital Logic and Microprocessor Design with VHDL.
Thomson. ISBN 0-534-46593-5. http://faculty.lasierra.edu/~ehwang/digitaldesign.
Processor Design: An Introduction - Detailed introduction to microprocessor design.
Somewhat incomplete and outdated, but still worthwhile.
[edit] See also
Wikibooks has a book on the topic of
Microprocessor Design
Computer science portal
Central processing unit
History of general purpose CPUs
Microprocessor
Microarchitecture
Moore's law
Amdahl's law
System-on-a-chip
Reduced instruction set computer
Complex instruction set computer
Minimal instruction set computer
Electronic design automation
High-level synthesis
v · d · eCPU technologies
Architecture
ISA : CISC · EDGE · EPIC · MISC · OISC · RISC · VLIW ·
NISC · ZISC · Harvard architecture · von Neumann architecture ·
4-bit · 8-bit · 12-bit · 16-bit · 18-bit · 24-bit · 31-bit · 32-bit · 36-
bit · 48-bit · 64-bit · 128-bit · Comparison of CPU architectures
Parallelism
Pipeline
Instruction pipelining · In-order & out-of-
order execution · Register renaming ·
Speculative execution · Hazards
Level Bit · Instruction · Superscalar · Data · Task
Threads
Multithreading · Simultaneous
multithreading · Hyperthreading ·
Superthreading
Flynn's taxonomy SISD · SIMD · MISD · MIMD
Types
Digital signal processor · Microcontroller · System-on-a-chip ·
Vector processor
Components
Arithmetic logic unit (ALU) · Address generation unit (AGU) ·
Barrel shifter · Floating-point unit (FPU) · Back-side bus ·
Multiplexer · Demultiplexer · Registers · Memory management
unit (MMU) · Translation lookaside buffer (TLB) · Cache ·
Register file · Microcode · Control unit · Clock rate
Power management
APM · ACPI · Dynamic frequency scaling · Dynamic voltage
scaling · Clock gating
Retrieved from "http://en.wikipedia.org/wiki/CPU_design"
Categories: Central processing unit | Computer engineering
Hidden categories: All articles with unsourced statements | Articles with unsourced statements
from May 2010 | Wikipedia articles needing cleanup from December 2009 | All articles
needing cleanup | All articles with specifically marked weasel-worded phrases | Articles with
specifically marked weasel-worded phrases from March 2009 | Articles lacking in-text
citations from March 2009 | All articles lacking in-text citations
Personal tools
Log in / create account
Namespaces
Article
Discussion
Variants
Views
Read
Edit
View history
Actions
Search
Navigation
Main page
Contents
Featured content
Current events
Random article
Donate to Wikipedia
Interaction
Help
About Wikipedia
Community portal
Recent changes
Contact Wikipedia
Toolbox
What links here
Related changes
Upload file
Special pages
Permanent link
Cite this page
Print/export
Create a book
Download as PDF
Printable version
Languages
ية عرب ال
Česky
Deutsch
Français
Nederlands
日本語
Polski
Português
Русский
Türkçe
This page was last modified on 8 February 2011 at 15:26.
Text is available under the Creative Commons Attribution-ShareAlike License;
additional terms may apply. See Terms of Use for details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit
organization.
Contact us
Privacy policy
About Wikipedia
Disclaimers
Von Neumann architecture
From Wikipedia, the free encyclopedia
Jump to: navigation, search
It has been suggested that System bus model be merged into this article or section.
(Discuss)
Schematic of the von Neumann architecture. The Control Unit and Arithmetic Logic Unit
form the main components of the Central Processing Unit (CPU)
The von Neumann architecture is a design model for a stored-program digital computer that
uses a central processing unit (CPU) and a single separate storage structure ("memory") to
hold both instructions and data. It is named after the mathematician and early computer
scientist John von Neumann. Such computers implement a universal Turing machine and have
a sequential architecture.
A stored-program digital computer is one that keeps its programmed instructions, as well as
its data, in read-write, random-access memory (RAM). Stored-program computers were an
advancement over the program-controlled computers of the 1940s, such as the Colossus and
the ENIAC, which were programmed by setting switches and inserting patch leads to route
data and to control signals between various functional units. In the vast majority of modern
computers, the same memory is used for both data and program instructions. The mechanisms
for transferring the data and instructions between the CPU and memory are, however,
considerably more complex than the original von Neumann architecture.
The terms "von Neumann architecture" and "stored-program computer" are generally used
interchangeably, and that usage is followed in this article.
Contents
1 Description
2 Development of the stored-program concept
3 Von Neumann bottleneck
4 Early von Neumann-architecture computers
5 Early stored-program computers
6 Non-von Neumann processors
7 See also
8 References
o 8.1 Inline
o 8.2 General
9 External links
[edit] Description
The earliest computing machines had fixed programs. Some very simple computers still use
this design, either for simplicity or training purposes. For example, a desk calculator (in
principle) is a fixed program computer. It can do basic mathematics, but it cannot be used as a
word processor or a gaming console. Changing the program of a fixed-program machine
requires re-wiring, re-structuring, or re-designing the machine. The earliest computers were
not so much "programmed" as they were "designed". "Reprogramming", when it was possible
at all, was a laborious process, starting with flowcharts and paper notes, followed by detailed
engineering designs, and then the often-arduous process of physically re-wiring and re-
building the machine. It could take three weeks to set up a program on ENIAC and get it
working.[1]
The idea of the stored-program computer changed all that: a computer that by design includes
an instruction set and can store in memory a set of instructions (a program) that details the
computation.
A stored-program design also lets programs modify themselves while running. One early
motivation for such a facility was the need for a program to increment or otherwise modify
the address portion of instructions, which had to be done manually in early designs. This
became less important when index registers and indirect addressing became usual features of
machine architecture. Self-modifying code has largely fallen out of favor, since it is usually
hard to understand and debug, as well as being inefficient under modern processor pipelining
and caching schemes.
On a large scale, the ability to treat instructions as data is what makes assemblers, compilers
and other automated programming tools possible. One can "write programs which write
programs".[2]
On a smaller scale, I/O-intensive machine instructions such as the BITBLT
primitive used to modify images on a bitmap display, were once thought to be impossible to
implement without custom hardware. It was shown later that these instructions could be
implemented efficiently by "on the fly compilation" ("just-in-time compilation") technology,
e.g., code-generating programs—one form of self-modifying code that has remained popular.
There are drawbacks to the von Neumann design. Aside from the von Neumann bottleneck
described below, program modifications can be quite harmful, either by accident or design. In
some simple stored-program computer designs, a malfunctioning program can damage itself,
other programs, or the operating system, possibly leading to a computer crash. Memory
protection and other forms of access control can usually protect against both accidental and
malicious program modification.
[edit] Development of the stored-program concept
The mathematician Alan Turing, who had been alerted to a problem of mathematical logic by
the lectures of Max Newman at the University of Cambridge, wrote a paper in 1936 entitled
On Computable Numbers, with an Application to the Entscheidungsproblem, which was
published in the Proceedings of the London Mathematical Society.[3]
In it he described a
hypothetical machine which he called a "universal computing machine", and which is now
known as the "universal Turing machine". The hypothetical machine had an infinite store
(memory in today's terminology) that contained both instructions and data. The German
engineer Konrad Zuse independently wrote about this concept in 1936.[4]
John von Neumann
became acquainted with Turing when he was a visiting professor at Cambridge in 1935 and
also during the year that Turing spent at Princeton University in 1936-37. Whether he knew of
Turing's 1936 paper at that time is not clear.
Independently, J. Presper Eckert and John Mauchly, who were developing the ENIAC at the
Moore School of Electrical Engineering, at the University of Pennsylvania, wrote about the
stored-program concept in December 1943.[5][6]
In planning a new machine, EDVAC, Eckert
wrote in January 1944 that they would store data and programs in a new addressable memory
device, a mercury metal delay line memory. This was the first time the construction of a
practical stored-program was proposed. At that time, they were not aware of Turing's work.
Von Neumann was involved in the Manhattan Project at the Los Alamos National Laboratory,
which required huge amounts of calculation. This drew him to the ENIAC project, in the
summer of 1944. There he joined into the ongoing discussions on the design of this stored-
program computer, the EDVAC. As part of that group, he volunteered to write up a
description of it. The term "von Neumann architecture" arose from von Neumann's paper
First Draft of a Report on the EDVAC dated 30 June 1945, which included ideas from Eckert
and Mauchly. It was unfinished when his colleague Herman Goldstine circulated it with only
von Neumann's name on it, to the consternation of Eckert and Mauchly.[7]
The paper was read
by dozens of von Neumann's colleagues in America and Europe, and influenced the next
round of computer designs.
Von Neumann was, then, not alone in putting forward the idea of the stored-program
architecture, and Jack Copeland considers that it is "historically inappropriate, to refer to
electronic stored-program digital computers as 'von Neumann machines'".[8]
His Los Alamos
colleague Stan Frankel said of his regard for Turing's ideas:
I know that in or about 1943 or '44 von Neumann was well aware of the
fundamental importance of Turing's paper of 1936 ... Von Neumann introduced
me to that paper and at his urging I studied it with care. Many people have
acclaimed von Neumann as the "father of the computer" (in a modern sense of
the term) but I am sure that he would never have made that mistake himself.
He might well be called the midwife, perhaps, but he firmly emphasized to me,
and to others I am sure, that the fundamental conception is owing to Turing—
in so far as not anticipated by Babbage ... Both Turing and von Neumann, of
course, also made substantial contributions to the "reduction to practice" of
these concepts but I would not regard these as comparable in importance with
the introduction and explication of the concept of a computer able to store in
its memory its program of activities and of modifying that program in the
course of these activities. [9]
Later, Turing produced a detailed technical report Proposed Electronic Calculator describing
the Automatic Computing Engine (ACE).[10]
He presented this to the Executive Committee of
the British National Physical Laboratory on 19 February 1946. Although Turing knew from
his wartime experience at Bletchley Park that what he proposed was feasible, the secrecy that
was maintained about Colossus for several decades prevented him from saying so. Various
successful implementations of the ACE design were produced.
Both von Neumann's and Turing's papers described stored program-computers, but von
Neumann's earlier paper achieved greater circulation and the computer architecture it outlined
became known as the "von Neumann architecture". In the 1953 book Faster than Thought
(edited by B.V. Bowden), a section in the chapter on Computers in America reads as
follows:[11]
THE MACHINE OF THE INSTITUTE FOR ADVANCED STUDIES,
PRINCETON
In 1945, Professor J. von Neumann, who was then working at the Moore
School of Engineering in Philadelphia, where the E.N.I.A.C. had been built,
issued on behalf of a group of his co-workers a report on the logical design of
digital computers. The report contained a fairly detailed proposal for the
design of the machine which has since become known as the E.D.V.A.C.
(electronic discrete variable automatic computer). This machine has only
recently been completed in America, but the von Neumann report inspired the
construction of the E.D.S.A.C. (electronic delay-storage automatic calculator)
in Cambridge (see page 130).
In 1947, Burks, Goldstine and von Neumann published another report which
outlined the design of another type of machine (a parallel machine this time)
which should be exceedingly fast, capable perhaps of 20,000 operations per
second. They pointed out that the outstanding problem in constructing such a
machine was in the development of a suitable memory, all the contents of
which were instantaneously accessible, and at first they suggested the use of a
special tube—called the Selectron, which had been invented by the Princeton
Laboratories of the R.C.A. These tubes were expensive and difficult to make, so
von Neumann subsequently decided to build a machine based on the Williams
memory. This machine, which was completed in June, 1952 in Princeton has
become popularly known as the Maniac. The design of this machine has
inspired that of half a dozen or more machines which are now being built in
America, all of which are known affectionately as "Johniacs."'
In the same book, the first two paragraphs of a chapter on ACE read as follows:[12]
AUTOMATIC COMPUTATION AT THE NATIONAL PHYSICAL
LABORATORY'
One of the most modern digital computers which embodies developments and
improvements in the technique of automatic electronic computing was recently
demonstrated at the National Physical Laboratory, Teddington, where it has
been designed and built by a small team of mathematicians and electronics
research engineers on the staff of the Laboratory, assisted by a number of
production engineers from the English Electric Company, Limited. The
equipment so far erected at the Laboratory is only the pilot model of a much
larger installation which will be known as the Automatic Computing Engine,
but although comparatively small in bulk and containing only about 800
thermionic valves, as can be judged from Plates XII, XIII and XIV, it is an
extremely rapid and versatile calculating machine.
The basic concepts and abstract principles of computation by a machine were
formulated by Dr. A. M. Turing, F.R.S., in a paper1. read before the London
Mathematical Society in 1936, but work on such machines in Britain was
delayed by the war. In 1945, however, an examination of the problems was
made at the National Physical Laboratory by Mr. J. R. Womersley, then
superintendent of the Mathematics Division of the Laboratory. He was joined
by Dr. Turing and a small staff of specialists, and, by 1947, the preliminary
planning was sufficiently advanced to warrant the establishment of the special
group already mentioned. In April, 1948, the latter became the Electronics
Section of the Laboratory, under the charge of Mr. F. M. Colebrook.
[edit] Von Neumann bottleneck
The separation between the CPU and memory leads to the von Neumann bottleneck, the
limited throughput (data transfer rate) between the CPU and memory compared to the amount
of memory. In most modern computers, throughput is much smaller than the rate at which the
CPU can work. This seriously limits the effective processing speed when the CPU is required
to perform minimal processing on large amounts of data. The CPU is continuously forced to
wait for needed data to be transferred to or from memory. Since CPU speed and memory size
have increased much faster than the throughput between them, the bottleneck has become
more of a problem, a problem whose severity increases with every newer generation of CPU.
The term "von Neumann bottleneck" was coined by John Backus in his 1977 ACM Turing
Award lecture. According to Backus:
Surely there must be a less primitive way of making big changes in the store
than by pushing vast numbers of words back and forth through the von
Neumann bottleneck. Not only is this tube a literal bottleneck for the data
traffic of a problem, but, more importantly, it is an intellectual bottleneck that
has kept us tied to word-at-a-time thinking instead of encouraging us to think
in terms of the larger conceptual units of the task at hand. Thus programming
is basically planning and detailing the enormous traffic of words through the
von Neumann bottleneck, and much of that traffic concerns not significant data
itself, but where to find it.[13]
The performance problem can be alleviated (to some extent) by several mechanisms.
Providing a cache between the CPU and the main memory, providing separate caches with
separate access paths for data and instructions (the so-called Harvard architecture), and using
branch predictor algorithms and logic are three of the ways performance is increased. The
problem can also be sidestepped somewhat by using parallel computing, using for example
the NUMA architecture—this approach is commonly employed by supercomputers. It is less
clear whether the intellectual bottleneck that Backus criticized has changed much since 1977.
Backus's proposed solution has not had a major influence.[citation needed]
Modern functional
programming and object-oriented programming are much less geared towards "pushing vast
numbers of words back and forth" than earlier languages like Fortran were, but internally, that
is still what computers spend much of their time doing, even highly parallel supercomputers.
[edit] Early von Neumann-architecture computers
The First Draft described a design that was used by many universities and corporations to
construct their computers.[14]
Among these various computers, only ILLIAC and ORDVAC
had compatible instruction sets.
ORDVAC (U-Illinois) at Aberdeen Proving Ground, Maryland (completed Nov
1951[15]
)
IAS machine at Princeton University (Jan 1952)
MANIAC I at Los Alamos Scientific Laboratory (Mar 1952)
ILLIAC at the University of Illinois, (Sept 1952)
AVIDAC at Argonne National Laboratory (1953)
ORACLE at Oak Ridge National Laboratory (Jun 1953)
JOHNNIAC at RAND Corporation (Jan 1954)
BESK in Stockholm (1953)
BESM-1 in Moscow (1952)
DASK in Denmark (1955)
PERM in Munich (1956?)
SILLIAC in Sydney (1956)
WEIZAC in Rehovoth (1955)
[edit] Early stored-program computers
The date information in the following chronology is difficult to put into proper order. Some
dates are for first running a test program, some dates are the first time the computer was
demonstrated or completed, and some dates are for the first delivery or installation.
The IBM SSEC was a stored-program electromechanical computer and was publicly
demonstrated on January 27, 1948. However it was partially electromechanical, thus
not fully electronic.
The Manchester SSEM (the Baby) was the first fully electronic computer to run a
stored program. It ran a factoring program for 52 minutes on June 21, 1948, after
running a simple division program and a program to show that two numbers were
relatively prime.
The ENIAC was modified to run as a primitive read-only stored-program computer
(using the Function Tables for program ROM) and was demonstrated as such on
September 16, 1948, running a program by Adele Goldstine for von Neumann.
The BINAC ran some test programs in February, March, and April 1949, although it
wasn't completed until September 1949.
The Manchester Mark 1 developed from the SSEM project. An intermediate version of
the Mark 1 was available to run programs in April 1949, but it wasn't completed until
October 1949.
The EDSAC ran its first program on May 6, 1949.
The EDVAC was delivered in August 1949, but it had problems that kept it from
being put into regular operation until 1951.
The CSIR Mk I ran its first program in November 1949.
The SEAC was demonstrated in April 1950.
The Pilot ACE ran its first program on May 10, 1950 and was demonstrated in
December 1950.
The SWAC was completed in July 1950.
The Whirlwind was completed in December 1950 and was in actual use in April 1951.
The first ERA Atlas (later the commercial ERA 1101/UNIVAC 1101) was installed in
December 1950.
[edit] Non-von Neumann processors
The NEC µPD7281D pixel processor was the first non-von Neumann microprocessor.[citation
needed]
Perhaps the most common kind of non-von Neumann structure used in modern computers is
content-addressable memory (CAM).
In some cases, emerging memristor technology may be able to circumvent the von Neumann
bottleneck.[16]
[edit] See also
Computer science portal
Harvard architecture
Modified Harvard architecture
Turing machine
Random access machine
Little man computer
CARDboard Illustrative Aid to Computation
Von Neumann syndrome
Interconnect bottleneck
[edit] References
[edit] Inline
1. ^ Copeland (2006) p. 104.
2. ^ MFTL (My Favorite Toy Language) entry Jargon File 4.4.7,
http://catb.org/~esr/jargon/html/M/MFTL.html, retrieved 2008-07-11
3. ^ Turing, A.M. (1936), "On Computable Numbers, with an Application to the
Entscheidungsproblem", Proceedings of the London Mathematical Society, 2 42: 230–65,
1937, doi:10.1112/plms/s2-42.1.230 (and Turing, A.M. (1938), "On Computable Numbers,
with an Application to the Entscheidungsproblem: A correction", Proceedings of the London
Mathematical Society, 2 43: 544–6, 1937, doi:10.1112/plms/s2-43.6.544)
4. ^ The Life and Work of Konrad Zuse Part 10: Konrad Zuse and the Stored Program
Computer, archived from the original on June 1, 2008,
http://web.archive.org/web/20080601160645/http://www.epemag.com/zuse/part10.htm,
retrieved 2008-07-11
5. ^ Lukoff, Herman (1979), From Dits to Bits...: A Personal History of the Electronic
Computer, Robotics Press, ISBN 978-0-89661-002-6
6. ^ ENIAC project administrator Grist Brainerd's December 1943 progress report for the first
period of the ENIAC's development implicitly proposed the stored program concept (while
simultaneously rejecting its implementation in the ENIAC) by stating that "in order to have
the simplest project and not to complicate matters" the ENIAC would be constructed without
any "automatic regulation".
7. ^ Copeland (2006) p. 113
8. ^ Copeland, Jack (2000), A Brief History of Computing: ENIAC and EDVAC,
http://www.alanturing.net/turing_archive/pages/Reference%20Articles/BriefHistofComp.html
#ACE, retrieved 27 January 2010
9. ^ Copeland, Jack (2000), A Brief History of Computing: ENIAC and EDVAC,
http://www.alanturing.net/turing_archive/pages/Reference%20Articles/BriefHistofComp.html
#ACE, retrieved 27 January 2010 which cites Randell, B. (1972), Meltzer, B.; Michie, D.,
eds., "On Alan Turing and the Origins of Digital Computers", Machine Intelligence 7
(Edinburgh: Edinburgh University Press): 10, ISBN 0902383264
10. ^ Copeland (2006) pp. 108-111
11. ^ Bowden (1953) pp. 176,177
12. ^ Bowden (1953) p. 135
13. ^ E. W. Dijkstra Archive: A review of the 1977 Turing Award Lecture,
http://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD692.html, retrieved 2008-07-
11
14. ^ Electronic Computer Project, http://www.ias.edu/spfeatures/john_von_neumann/electronic-
computer-project/[dead link]
15. ^ Illiac Design Techniques, report number UIUCDCS-R-1955-146, Digital Computer
Laboratory, University of Illinois at Urbana-Champaign, 1955
16. ^ Mouttet, Blaise L (2009), "Memristor Pattern Recognition Circuit Architecture for
Robotics", Proceedings of the 2nd International Multi-Conference on Engineering and
Technological Innovation II: 65–70,
http://www.iiis.org/CDs2008/CD2009SCI/CITSA2009/PapersPdf/I086AI.pdf
[edit] General
Bowden, B.V., ed. (1953), "Computers in America", Faster Than Thought: A
Symposium on Digital Computing Machines, London: Sir Isaac Pitman and Sons Ltd.
Rojas, Raúl; Hashagen, Ulf, eds. (2000), The First Computers: History and
Architectures, MIT Press, ISBN 0-262-18197-5
Davis, Martin (2000), The universal computer: the road from Leibniz to Turing, New
York: W W Norton & Company Inc., ISBN 0-393-04785-7
Can Programming be Liberated from the von Neumann Style?, John Backus, 1977
ACM Turing Award Lecture. Communications of the ACM, August 1978, Volume
21, Number 8. Online PDF
C. Gordon Bell and Allen Newell (1971), Computer Structures: Readings and
Examples, McGraw-Hill Book Company, New York. Massive (668 pages).
Copeland, Jack (2006), "Colossus and the Rise of the Modern Computer", in
Copeland, B. Jack, Colossus: The Secrets of Bletchley Park's Codebreaking
Computers, Oxford: Oxford University Press, ISBN 978-0-19-284055-4.
[edit] External links
Harvard vs von Neumann
A tool that emulates the behavior of a von Neumann machine
v · d · eCPU technologies
Architecture
ISA : CISC · EDGE · EPIC · MISC · OISC · RISC · VLIW ·
NISC · ZISC · Harvard architecture · von Neumann
architecture · 4-bit · 8-bit · 12-bit · 16-bit · 18-bit · 24-bit · 31-
bit · 32-bit · 36-bit · 48-bit · 64-bit · 128-bit · Comparison of CPU
architectures
Parallelism
Pipeline
Instruction pipelining · In-order & out-of-
order execution · Register renaming ·
Speculative execution · Hazards
Level Bit · Instruction · Superscalar · Data · Task
Threads
Multithreading · Simultaneous
multithreading · Hyperthreading ·
Superthreading
Flynn's taxonomy SISD · SIMD · MISD · MIMD
Types
Digital signal processor · Microcontroller · System-on-a-chip ·
Vector processor
Components
Arithmetic logic unit (ALU) · Address generation unit (AGU) ·
Barrel shifter · Floating-point unit (FPU) · Back-side bus ·
Multiplexer · Demultiplexer · Registers · Memory management
unit (MMU) · Translation lookaside buffer (TLB) · Cache ·
Register file · Microcode · Control unit · Clock rate
Power management
APM · ACPI · Dynamic frequency scaling · Dynamic voltage
scaling · Clock gating
Retrieved from "http://en.wikipedia.org/wiki/Von_Neumann_architecture"
Categories: Computer architecture | Flynn's Taxonomy | Reference models | Classes of
computers
Hidden categories: All articles with dead external links | Articles with dead external links
from July 2010 | Articles to be merged from October 2010 | All articles to be merged | All
articles with unsourced statements | Articles with unsourced statements from December 2010 |
Articles with unsourced statements from April 2010
Personal tools
Log in / create account
Namespaces
Article
Discussion
Variants
Views
Read
Edit
View history
Actions
Search
Navigation
Main page
Contents
Featured content
Current events
Random article
Donate to Wikipedia
Interaction
Help
About Wikipedia
Community portal
Recent changes
Contact Wikipedia
Toolbox
What links here
Related changes
Upload file
Special pages
Permanent link
Cite this page
Print/export
Create a book
Download as PDF
Printable version
Languages
ية عرب ال
Asturianu
Беларуская
Bosanski
Български
Català
Česky
Deutsch
Ελληνικά
Español
سی ار ف
Français
한국어
Hrvatski
Bahasa Indonesia
Íslenska
Italiano
עברית
Latina
Latviešu
Magyar
Nederlands
日本語
Norsk ( ok l)
Polski
Português
Ro ână
Русский
Shqip
Slovenčina
Српски / Srpski
Srpskohrvatski / Српскохрватски
Suomi
Svenska
ไทย Türkçe
Українська
中文
This page was last modified on 13 February 2011 at 21:43.
Text is available under the Creative Commons Attribution-ShareAlike License;
additional terms may apply. See Terms of Use for details.
Wikipedia® is a registered trade ark of the Wikimedia Foundation, Inc., a non-profit
organization.
Contact us
Privacy policy
About Wikipedia
Disclaimers
Lecture 3
1. Computer components. Hardware and Software programming.
2. The main cycle of instruction processing (MCIP).
Literature.
1. Stallings W. Computer Organization and Architecture. Designing and performance, 5
th ed. – Upper Saddle River, NJ :
Prentice Hall, 2002.
2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL
INTERNATIONAL EDITIONS, 1996.
3. Tanenbaum, A.S. Structured Computer Organization, 4th
ed. - Upper Saddle River, NJ : Prentice Hall, 2002.
Sequence of
Arithmetic
And Logic
Functions
Data Results
Customized
Hardware
Programming in Hardware (Hardwired Program)
Instructions Codes
Instruction
interpreter
General-Purpose Arithmetic and Logic
Functions Results Data
Programming in Software
Control Signals
Program Concept
Hardwired systems are inflexible
General purpose hardware can do different tasks, given correct control signals
Instead of re-wiring, supply a new set of control signals
What is a program?
A sequence of steps
For each step, an arithmetic or logical operation is done
For each operation, a different set of control signals is needed
Function of Control Unit
For each operation a unique code is provided
e.g. ADD, MOVE
A hardware segment accepts the code and issues the control signals
We have a computer!
Components
The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit
Data and instructions need to get into the system and results out
Input/output
Temporary storage of code and results is needed
Main memory
The CPU is typically in control. It exchanges data with memory. For this purpose, it typically
makes use of two internal (to the CPU) registers: a memory address register (MAR), which specifies the
address in memory for the next read or write, and a memory buffer register (MBR), which contains the
data to be written into the memory or receives the data read from the memory. Similarly, an I/O address
register (I/OAR) specifies a particular I/O device. An I/O buffer register is used for the exchange data
between an I/O module and the CPU.
A memory module consists of a set of locations, defined by sequentially numbered addresses. Each
location contains binary number that can be interpreted as either an instruction or data. An I/O module
transfers data from external devices to CPU and memory, and vice versa. It contains internal buffers for
temporarily holding this data until it can be sent on.
Computer Components:
Top Level View
Instruction Cycle
Two steps: Fetch Execute
The instruction fetch consists of reading an instruction from a location in
the memory.
The instruction execution may involve several operations and depends on
the nature of the instruction.
Integer Format
0 1 15
15
0 3 4 15
OpCode Address
S Magnitude
Instruction Format
Program Counter (PC) = Address of Instruction
Instruction Register (IR) = Instruction Being Executed
Accumulator (AC) = Temporary Storage
Internal CPU Registers
0001 = Load AC from Memory
0010 = Store AC to Memory
0101 = Add to AC from Memory
Partial List of OpCodes
Characteristics of Hypothetical Machine
The instruction code is a group of bits that instruct the computer to perform a specific
operation. It is usually divided into parts, each having its own particular interpretation.
The most basic part of an instruction code is its operation part, which defines such
operations as add, subtract, multiply, shift, complement.
The number of bits required for the operation code of an instruction depends on the total
number of operations available in the computer.
At this point we must recognize the relationship between a computer operation and a micro
operation. An operation is a part of an instruction stored in the computer memory. It is a binary
code that tells the computer to perform a specific operation. The control unit receives the
instruction from memory and interprets the operation code bits. It then issues a sequence of
control signals to initiate micro operations in internal computer registers. For every operation
code, the control issues a sequence of micro operations needed for hardware implementation of
the specified operation. For this reason, an operation code is sometimes called a macrooperation
because it specifies a set of micro operations.
The operation must be performed on some data stored in processor registers or in memory.
An instruction code must therefore specify not only the operation but also registers or the
memory words where the operands are to be found, as well as the register or memory word
where the result is to be stored. So, the second part of the instruction code specifies an address,
which tells the control where to find operand in the memory. This operand is read from the
memory and used as the data to be operated on together with data stored in the processor
register.
Fetch Cycle
Program Counter (PC) holds address of next instruction to fetch
Processor fetches instruction from memory location pointed to by PC
Increment PC Unless told otherwise
Instruction loaded into Instruction Register (IR) Processor interprets instruction and performs
required actions
Execute Cycle
Processor-memory data transfer between CPU and main memory
Processor I/O Data transfer between CPU and I/O module
Data processing Some arithmetic or logical operation on data
Control Alteration of sequence of operations e.g. jump
Combination of above
Example of Program Execution
Instruction Cycle -
State Diagram
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0002
941
Memory
0003
Registers of CPU
300
300
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0002
941
0003
300
300
1940
1
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0002
941
0003
301
300
1940
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0002
941
0003
301
300
0003
1
Instruction
Fetch
Loading of AC from Memory
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0002
941
0003
302
0003
301
5941 5
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0002
941
0003
302
0003
301
5941 1
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0002
941
0003
301
0003
301
0003
1
Instruction
Fetch
Instruction
Fetch
1940
5941
2941
300
301
302
….
940
0002
941
0003
PC
AC
MAR
MBR
IR
301
0003
300
0003
1
Loading of AC from Memory
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0002
941
0003
302
0005
301
0002 5
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0002
941
0003
303
0005
302
2941
5
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0002
941
0003
302
0005
302
0002 5
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0002
941
0003
303
0005
302
2941
2
3 + 2 = 5
To the contents of AC the number, which has been read from
the memory is adding
Instruction
Fetch
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0005
941
0003
303
0005
302
0005
2
1940
5941
2941
300
301
302
….
940
0005
941
0003
PC
AC
MAR
MBR
IR
303
0005
302
0005
2
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0005
941
0003
303
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0005
941
0003
303
0005
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0005
941
0003
303
0005
PC
AC
MAR
MBR
IR
1940
5941
2941
300
301
302
….
940
0005
941
0003
303
0005
302
0005
2
Работа компьютера состоит в периодическом повторении основного
цикла выполнения команды (ОЦВК) (main instruction cycle processing -
MCIP). В схеме алгоритма используются следующие обозначения:
М(Х) – содержимое ячейки памяти по адресу Х;
(X:Y) – разряды от X до Y , разряды нумеруются от старших к младшим;
Каждый цикл состоит из двух фаз (тактов): фазы извлечения(fetch cycle) и
фазы выполнения(execution cycle).
В течение фазы извлечения (fetch cycle) код операции очередной
команды загружается в регистр команд IR, а содержимое поля адреса этой же
команды – в регистр адреса MAR. Сама команда может быть извлечена либо
из буферного регистра команд IBR, либо из памяти M. В последнем случае
сначала прочитанное из памяти M слово загружается в регистр данных
памяти MBR, а уже из него отдельные компоненты передаются в IBR, IR и
MAR.
Для упрощения электронных схем, обеспечивающих связь с блоком
памяти, потребовалось все операции чтения и записи в память выполнять
через единственную пару регистров, один из которых хранит адрес ячейки, а
второй – слово (операнд), считываемое из памяти или записываемое в память.
После того как код операции будет загружен в регистр команды IR,
наступает черед фазы выполнения (execution cycle). Цепи управления
расшифровывают код операции и посылают соответствующие управляющие
сигналы, которые синхронизируют пересылку данных и выполнение
арифметических или логических операций схемами АЛУ.
Система команд насчитывала 21 команду, которые группировались
следующим образом:
Команды пересылки данных, которые выполняют пересылку
данных из заданной ячейки памяти в один из двух адресуемых
регистров АЛУ (аккумулятор или регистр множимого/частного) или
из этих регистров в заданную ячейку памяти.
Команды перехода (условного/безусловного), изменяют
естественный порядок выполнения команд программы.
Арифметические команды, которые задают выполнение четырех
арифметических действий (некоторые арифметические команды
имеют модификации).
Команды модификации адресной части команды, которые
позволяют выполнять модификацию программы программным путем,
заменяя первоначально установленные значения адресных полей в
командах.
Начало
В IBR есть
следующая
команда?
MAR PC
IBR MBR(20:39)
IR MBR(0:7)
MAR MBR(8:19)
Требуется
команда в
левой части
слова?
MBR M(MAR)
IR IBR (0:7)
MAR IBR (8:19)
MBR M(MAR)
PC PC + 1
AC 0?
PC MAR
MBR M(MAR)
AC MBR AC AC + MBR
IR MBR (20:27)
MAR MBR(28:39)
Фаза извлечения
из памяти
Фаза выполнения
Не
требуется
обращение
к памяти
Да
Да
Нет
Нет
Расшифровка
команды в IR
Если AC 0, то
перейти к М(Х,0:19)
ACAC+М (Х)
Переход к
М(Х,0:19)
Да
Нет
Упрощенная схема ОЦВК в компьютере IAS.
Start
Is there
the next
instruction in
IBR?
MAR PC
IBR MBR(20:39)
IR MBR(0:7)
MAR MBR(8:19)
Is an
instruction in
the left part of
the word
required?
MBR M(MAR)
IR IBR (0:7)
MAR IBR (8:19)
MBR M(MAR)
PC PC + 1
AC 0?
PC MAR
MBR M(MAR)
AC MBR AC AC + MBR
IR MBR (20:27)
MAR MBR(28:39)
Fetch Cycle
Execution Cycle
Storage
request
isn’t
required
.
Yes
Yes
No
No
Decoding of the
instruction in IR
If AC 0, then go to
М(Х,0:19)
ACAC+М (Х)
Jump to
М(Х,0:19)
Yes
No
Simplified Scheme of MCIP in IAS.
Questions to Lecture 3.
1. What’s Hardwired Program? (What’s programming in
Hardware?)
2. What’s Software Program? (What’s programming in
Software?)
3. Describe the functional structure of Computer components
(Top level View) in the eye of Interconnection Subsystem.
What’s the Main Cycle of Instruction Processing (MCIP)?
5. Describe the architecture of “Hypothetical Machine”. What is
the difference between translator and interpreter?
6. Describe each step of MCIP on the “Hypothetical Machine”
for one concrete instruction.
7. Describe each step of MCIP on the IAS for one concrete
instruction.
Lecture 4
Interrupts. The goal of the lecture : analyze and study Interrupts, classes of interrupts, program flow control, interrupt Cycle.
Contents
1. Interrupts. Classes of interrupts.
2. Program Flow Control.
3. Interrupt Cycle.
Literature.
1. Stallings W. Computer Organization and Architecture. Designing and performance, 5
th ed. – Upper Saddle River, NJ :
Prentice Hall, 2002.
2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL
INTERNATIONAL EDITIONS, 1996.
3. Tanenbaum, A.S. Structured Computer Organization, 4th
ed. - Upper Saddle River, NJ : Prentice Hall, 2002.
Interrupts
Mechanism by which other modules (e.g. I/O, memory) may interrupt normal sequence of processing. Interrupts are some changes in the control flow caused not by the program itself, but by something other, and usually connected with the I/O process. Interrupt is a temporary cessation of the process caused by an event, which is an external one as regards to this process.
Classes of most common Interrupts:
Program
e.g. overflow, division by zero
Timer
Generated by internal processor timer
I/O
from I/O controller
Hardware failure
e.g. memory parity error
Interrupts are provided primarily as a way to improve processing efficiency.
cv
Program Flow Control
Program Flow Control is an abstraction (some sort of virtual operations) at the set of all possible sequences of
execution in the program.
1, 2 and 3 – code
segments refer to
sequences of
instructions that do not
involve I/O. The WRITE calls are calls
to an I/O program
that is a system utility
and that will perform
the actual I/O
operation.
The I/O operation
consists of three
sections:
4 – a sequence of
instructions which
prepare for the
operation.
I/O command – the
actual I/O command.
5 – a sequence of
instructions which
complete the operation.
The user’s program doesn’t have to contain any special code to accommodate
interrupts; the processor and the operating system are responsible for suspending the
user’s program and then resuming it at the same point.
Interrupt Cycle (IC)
With interrupts mechanism processor can be engaged in executing other instructions
while I/O operation is in progress
IC is added to instruction cycle to accommodate interrupts
Processor checks for interrupt
Indicated by an interrupt signal if interrupts are pending (I/O module sends an interrupt request signal to the processor when the external device becomes ready to be serviced).
If no interrupt, fetch next instruction
If interrupt pending:
Suspend execution of current program
Save context
Set PC to start address of interrupt handler routine
Process interrupt
Restore context and continue interrupted program
START
Fetch next
instruction
Execute instruction
Check for interrupt; Process
interrupt
HALT
Fetch Cycle
Execute Cycle
Interrupt Cycle
Interrupts Enabled Interrupts Disabled
Instruction Cycle (with Interrupts) - State Diagram
1
4 Processor
Wait 5 2 4 5 3
Processor
Wait
1
4
2a
5
2b
4
3a
5
3b
I/O Operation I/O Operation
I/O Operation I/O Operation Time
a)
b)
Program timing: short I/O wait a) Without Interrupts b) With Interrupts
Program timing: long I/O wait
1 2 5 4 4 5 3
1 4 2 5 4 3 5
CPU
Wait
CPU
Wait
CPU
Wait
CPU
Wait
I/O OPERATION I/O OPERATION
I/O OPERATION I/O OPERATION
a)
b)
Time
Program timing: long I/O wait a) Without Interrupts b) With Interrupts
Issue Read
Command to
I/O Module
Issue Read
Command to
I/O Module
Read Status of
I/O Module Read Status of
I/O Module
Check
Status
Check
Status
Read Word from I/O
Module
Read Word from
I/O Module
Write Word
into Memory
Write Word
into Memory
Done?
Done?
CPU I/O
I/O CPU
Error
Condition
I/O CPU
CPU Memory
CPU I/O
Do Something
Else
Interrupt
I/O CPU
Error Condition
I/O CPU
CPU Memory
Issue Read
Block
Command to
DMA
Module
Read Status of
DMA Module
CPU DMA
Do Something
Else
Interrupt
DMA CPU
No
Yes
Ready
Not
Ready
No
Yes
Ready
Next Instruction
Next Instruction
Next Instruction
Direct Memory Access
Algorithm of Data-Block Input
Three I/O Techniques.
Programmed I/O.
With the programmed I/O data is exchanged between the CPU and I/O module. The CPU
executes a program that gives it direct control of the I/O operation, including sensing device
status, sending a read or write command and transferring data.
To execute I/O-related instruction, the CPU issues an address, specifying the particular
module and external device, and I/O command. There are four types of I/O commands that I/O
module may receive when it is addressed by the CPU: control, test, read, and write.
Interrupt-driven I/O.
With interrupt—driven I/O, the CPU issues an I/O command, continues to execute other
instructions, and is interrupted by I/O module when the latter has completed work.
With both programmed and interrupt I/O the CPU is responsible for extracting data from
main memory for output and storing data in main memory for input. Both these forms suffer
from two “inherent drawbacks” («врожденных недостатков»):
1. The I/O transfer rate is limited by the speed with which the CPU can test and service a
device.
2. The CPU is tied up in managing an I/O transfer; a number of instructions must be executed
for each I/O transfer.
Direct Memory Access (DMA).
DMA permits to the I/O module and main memory exchange data directly without CPU
involvement.
DMA involves an additional module on the system bus (DMA controller). The DMA
controller is capable of “mimicking” (подмена) the CPU, indeed, of taking over control of the
system from the CPU. The technique works as follows: when CPU wishes to read or write a
block of data, it issues a command to the DMA, by sending the following information:
Whether a read or write is requested.
The address of the I/O device involved.
The starting location in memory to read from or write to.
The number of words to be read or written.
The CPU then continues with other work. It has delegated this I/O operation to the DMA
module, and that module will take care of it. The DMA transfers the entire block of data, one
word at a time, directly to or from the memory, without going through the CPU, When the
transfer is complete, the DMA sends an interrupt signal to the CPU. Thus the CPU is involved
only at the beginning and the end of the transfer.
Multiple Interrupts
Disable interrupts (Режим запрета прерывания) Processor will ignore further interrupts whilst
processing one interrupt Interrupts remain pending and are checked after first
interrupt has been processed Interrupts handled in sequence as they occur
Define priorities (Режим приоритетного прерывания) Low priority interrupts can be interrupted by higher
priority interrupts When higher priority interrupt has been processed,
processor returns to previous interrupt
Multiple Interrupts - Sequential
This approach is nice and
simple, as interrupts are
handled in strict sequential
order.
The drawback of this approach
is that it doesn’t take into
account relative priority or
time critical needs.
Multiple Interrupts - Nested
This approach is to define priorities for
interrupts and to allow an interrupt of
higher priority (“estate”) to cause a lower-
priority interrupt handler to be itself
interrupted.
Example.
Consider a system with 3 I/O devices:
a printer (priority 2);
a disk (priority 4);
a communication line (priority 5)
Let user program begins at t = 0. At t = 10,
a printer interrupt occurs, user information
is placed on the stack, and execution
continues at the printer interrupt service
routine (ISR). While this routine is still
executing, at t = 15, a communication
interrupt occurs. Since communication
line has higher priority, the interrupt is
honored, the printer ISR is interrupted.
The state of printer is pushed onto the stack, and the execution continues at the communication
ISR. While this routine is executing, a disk interrupt occurs at t = 20. Since this interrupt is of lower
priority it is simply held, and communication ISR runs to completion. When communication ISR is
complete (t = 25) the previous processor state is restored, with the execution of the printer ISR.
However, before even a single instruction in this routine can be executed, the processor honors the
higher priority disk interrupt and control transfers to the disk ISR. Only when that routine is
complete (t = 35) , the printer ISR is resumed. When that routine completes (t = 40), control finally
returns to the user program.
User Program
t=0
t=10
Printer ISR
t = 15
t = 40
Communication
ISR
t=25
Disk ISR
t=25
t= 35
Time sequences of multiple interrupts
DMA request
CPU
Address
decoder
I/O
device
Memory
RD WR Address Data
DMA
controller DMA acknowledge
RD
DS
RS
BR
BG
WR Address WCR
Interrupt
Interrupt
BG
BR
RD WR Address Data
DMA Transfer in Computer System
CR
Address –address register specifies the desired location of a word in memory;
WCR – word-count register specifies the number of word that must be transferred;
CR- control register specifies the mode of transfer;
RD- Read
WR- Write
DS- DMA Select
RS-Register Select Interface registers
BG-Bus Granted
BR –Bus Request
Direct Memory Access Technique.
CPU initializes the DMA by sending the following information through Data Register:
1) the starting address of the Memory block;
2) the word count (number of words in this block);
3) type of operation (Read or Write);
4) a control bit to start the DMA transfer.
After it CPU stops communicating with DMA Controller unless it (CPU) receives an interrupt signal or needs to check how
many words have been transferred.
Literature.
M.M. Mano, C.R. Kime . Logic and Computer Design Fundamentals. Part 2 (pp.557-
561).
Address Space Allocation. Address Space(AS) is a set of addresses, which the
microprocessor is able to generate. The allocation of the general components in the address
space is unified
Volume of AS Physical addr. Segment addr. 00000h 0000h
1 Kb 00400h 0040h
256 byte 00500h 0050h Usual Memory
(640 Kb) A0000h A000h
64 Kb B0000h B000h
32 Kb B8000h B800h
32 Kb C0000h C000h Senior
64 Kb Memory D0000h D000h (384 Kb)
128 Kb F0000h F000h
64 Kb 100000h
64 Kb 10FFF0h Extended Memory
Up to 4 Gb Fig. 1 Typical Allocation of Address Space
Vectors of
Interruptions
Area of BIOS ‘s
Data
Free memory for
Application programs
Graphical Video-Buffer
Free addresses
Text Video-Buffer
Permanent Storage of
BIOS Extensions
Free addresses
Permanent Storage of
BIOS
High Memory Area (HMA)
Extended Memory
Specification (XMS)
Operational System
MS-DOS
Apparatus Organization of Interrupts.
Signals of apparatus interrupts, which appear in computer devices, come in the microprocessor not directly, but
through two interrupt controllers (the Leading Interrupts
Controller and the Driven Interrupts Controller).
The Driven
Interrupts
Controller
The Base
Vector
70h
The Leading
Interrupts
Controller
The Base
Vector
08h
Processor
IQR15
IQR14
IQR13
IQR12
IQR11
IQR10
IQR9
IQR8
IQR7
IQR6
IQR5
IQR4
IQR3
IQR2
IQR1
IQR0 Timer
Keyboard
Mouse
Floppy
Disk
Printer
Hard
Disk
Signal
INT
Vector’s
Number
Pict. Procedure of Interrupt Service.
IP
Interrupt Handler 0
CS
Interrupt Handler 0
IP
Interrupt Handler 1
CS
Interrupt Handler 1
IP
Interrupt Handler
n CS
Interrupt Handler n
IP
CS
Flags
Interrupt Vector 0
Interrupt Vector 1
Interrupt Vector n
Vector
of
Inter-
rupted
Process
Processor
IP
CS
Flags
SP at
the moment of Interrupt
Memory addresses
0
2
4
6
4n
4n +2
Stages of Perfecting Input/Output Subsystem during the process of
Computer System Development.
The First Stage. CPU directly controls all external devices. Now such technology
is used in simplest devices with micro-processors control.
The Second Stage. In a computer system a controller of the external device or an
input/output module is included. During the data exchange CPU uses the
programmable input/output methodology without interrupts. CPU entrusts many
functions of control by some units of external device to the Input/Output module and
gets rid of care of direct interface with the external device.
The Third Stage. Here the same configuration of System is used (as in the
previous stage), but the exchange process is realized on the interruptions base. CPU
doesn’t waste time in vain for waiting the external device will be ready for exchange
by the next in turn portion of data.
The Forth Stage. Input/Output modules have ability of direct access to RAM through
the DMA Controller, and now the exchange process is realized practically without
CPU’s participation. CPU is necessary only for initializing (starting) the seance of
data exchange and receipt a signal of seance’s termination.
The Fifth Stage. On this stage the Input/Output module becomes a processor of
Input/Output which is allotted special rights in the System, it is able to execute some
certain instructions. CPU sends to this module (processor) only instruction “fulfill the
program” (this program is stored in RAM). The module itself runs this program, and
after this program termination it informs CPU about completing the work.
The Sixth Stage. Now the I/O Module is able not only run special programs, but it
is equipped by own block of local memory. So, as a matter of fact, it becomes a
plenipotentiary computer in the Computer System’s composition, though a minimal
participation of CPU is necessary here as well.
Этапы развития подсистемы ввода-вывода в процессе развития
вычислительной техники.
Первый этап. ЦП непосредственно управляет внешними устройствами. В
настоящее время эта технология находит применение в простейших
устройствах с микропроцессорным управлением.
Второй этап. В систему включается контроллер внешнего устройства или
модуль ввода-вывода. При обмене данными ЦП использует методику
программируемого ввода-вывода без прерываний. При этом ЦП передает
большинство функций управления отдельными узлами внешнего устройства
контроллеру или модулю ввода-вывода и избавляется от забот о
непосредственном интерфейсе с внешним устройством.
Третий этап. Используется та же конфигурация системы, что и на втором
этапе, но процесс обмена реализуется на базе прерываний. ЦП не теряет
напрасно время на ожидание, пока внешнее устройство будет готово
обменяться очередной порцией данных (это значительно повысило
производительность компьютера).
Четвертый этап. Модули ввода-вывода получают возможность прямого
доступа к памяти через контроллер DMA. Теперь обмен данными с внешними
устройствами выполняется практически без участия ЦП. ЦП необходим в
этом случае только для запуска сеанса обмена и приема сигнала о завершении
сеанса.
Пятый этап. Модуль ввода-вывода превращается в процессор ввода-
вывода с собственными правами в системе, он способен выполнять
специализированные инструкции. ЦП передает этому модулю (процессору)
только указание выполнить программу, которая размещена в оперативной
памяти. Модуль самостоятельно выполняет эту программу. После завершения
программы модуль оповещает ЦП о завершении своей работы.
Шестой этап. Теперь модуль ввода-вывода способен не только выполнять
специализированную программу, но и оснащается своим собственным
блоком локальной памяти. Таким образом он становится, по существу,
полноценным компьютером в составе вычислительного комплекса. Такой
специализированный компьютер может обеспечивать обмен с множеством
внешних устройств при минимальном участии ЦП в этом процессе.
Questions to Lecture № 4.
1. What do we mean under the Interrupts? What is the main
reason of using the Interrupt Mechanism?
2. Draw up diagrams of the Program Flow Control without
interrupts and with interrupts, describe each fragment of the
Program Flow Control.
3. Which classes of interrupts must be enabled constantly? (give
explanation)
4. Describe the mechanism of work with interrupts.
5. In the diagram “Program Flow Control” find points, which
correspond to interrupts of user’s program and explain the
necessity of using these interrupts.
6. How many techniques of I/O operations execution are used?
Describe each of these techniques and compare them.
7. Which approaches can be taken to dealing with multiple
interrupts? Show advantages and disadvantages of these
approaches.
Lecture 5
I.
1. Interconnections of base computer components through the bus.
2. Bus structure.
3. Bus hierarchy.
II.
1. Elements of Bus Design (Types, Methods of Arbitration, Timing).
2. PCI bus. Instructions of PCI bus. Data transaction and arbitration of PCI bus.
Literature.
1. Stallings W. Computer Organization and Architecture. Designing and performance, 5
th ed. – Upper Saddle River, NJ :
Prentice Hall, 2002.
2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL
INTERNATIONAL EDITIONS, 1996.
3. Tanenbaum, A.S. Structured Computer Organization, 4th
ed. - Upper Saddle River, NJ : Prentice Hall, 2002.
Connecting
In effect, a computer is a network of basic modules (CPU, Memory,
I/O), thus, there must be paths for connecting the modules together.
The way of connecting the various modules is called the
interconnection structure.
All the units must be connected
Different types of connections for different types of units
Memory
Input/Output
CPU
Memory Connection
Receives and sends data
Receives addresses (of locations)
Receives control signals
Read
Write
Timing
Memory
0
1
.
. N-1
N Words
Read
Write
Address
Data
Data
Input/Output Connection(1)
Similar to memory from computer’s viewpoint
Output
Receive data from computer
Send data to peripheral
Input
Receive data from peripheral
Send data to computer
Input/Output Connection(2)
Receive control signals from computer
Send control signals to peripherals
e.g. spin disk
Receive addresses from computer
e.g. port number to identify peripheral
Send interrupt signals (control)
Address
Internal Data
External Data
Internal Data
External Data
Interrupt Signals
I/O Module
M Ports
Read
Write
CPU Connection
Reads instructions and data
Writes out data (after processing)
Sends control signals to other units
Receives (& acts on) interrupts
Instructions
Data
Interrupt Signals
Control Signals
Data
CPU Address
The interconnection structure is determined by character of exchange
operations, which are specific for each module.
Major forms of input and output for the modules: Memory: Typically, a memory module will consists of N words of equal length. Each word
is assigned a unique numerical address (0, 1, …, N-1). A word of data can be read from or
written into the memory. The nature of the operations is indicated by READ or WRITE
control signals. The location for the operation is specified by an address.
I/O Module: It’s functionally similar to the memory (from internal point of view).
There are two operations READ and WRITE. Further, an I/O module may control more
than one external device. We can refer to each of the interfaces to an external device as a port
and give each a unique address (e.g., 0, 1, 2.,…, M-1). In addition, there are external data
paths for the input and output of data with an external device. Finally, an I/O module may be
able to send interrupt signals to the CPU.
CPU: CPU reads in instructions and data, writes out data after processing, and uses
control signals to control the overall operation of the system. It also receives
interrupt signals.
Types of transfers supported by interconnection structure.
Memory to CPU: The CPU reads an instruction or unit of data
from memory.
CPU to Memory: The CPU writes a unit of data to memory.
I/O to CPU: The CPU reads data from I/O device via an I/O
module.
CPU to I/O: The CPU sends data to the I/O device.
I/O to or from the Memory: For these two cases, an I/O
module is allowed to exchange data directly with memory,
without going through the CPU, using direct memory access
(DMA).
Multiplexer is a functional device which permits to two or more channels of data link to use the same common device of data transfer jointly.
Buses
There are a number of possible interconnection systems
Single and multiple BUS structures are most common
e.g. Control/Address/Data bus (PC) e.g. Unibus (DEC-PDP)
What is a Bus?
A bus is a set of electric pathways and service
electronic devices (framing), providing exchange
of data among computer units and devices.
A communication pathway connecting two or more devices is a bus.
Often grouped
A number of channels in one bus e.g. 32 bit data bus is 32 separate single bit channels
Power lines may not be shown
What do buses look like?
Parallel lines on circuit boards
Ribbon (ленточный) cables
Strip (полоса) connectors on mother boards e.g. PCI (Peripheral Component Interconnect)
Sets of wires
A system bus consists, typically, of from 50 to 100 separate lines, which can be
classified into three functional groups: data, address and control lines (power lines are
usually omitted ).
Data Bus (Line)
The data lines provide a path for moving data between system modules. Number of lines is
referred as WIDTH of the data bus (the number of lines determines how many bits can be
transferred at a time)
Carries data
Remember that there is no difference between “data” and “instruction” at this level!
Width of Data Bus is a key determinant of the system performance
8, 16, 32, 64 bit
Bus Structure
Address Bus (Line)
Identify the source or destination of data
(e.g. CPU needs to read an instruction (data) from a given location in memory)
Address Bus width determines maximum memory capacity of the system.
Used to address as the Main Memory, so I/O ports (the higher-order bits are used to select a particular module on the bus, and the lower-order bits select an address in the Memory or I/O port within the module). E.g., if a width of a bus is equal to 8, then codes 01111111 and less specify cells addresses in the Main
Memory module (module with 0 address), and codes from 10000000 and higher specify I/O ports which are under control of a module with an address 1.
Command signals specify operations to be performed. Typical control lines include:
Memory Write: Causes data on the bus to be written into the addressed location.
Memory Read: Causes data from the addressed location to be placed on the bus.
I/O Write: Causes data on the bus to be output to the addressed I/O port.
I/O Read: Causes data from the addressed I/O port to be placed on the bus.
Transfer ACK: Indicates that data have been accepted from or placed on the bus.
Bus Request: Indicates that a module needs to gain control of the bus.
Bus Grant: Indicates that a requesting module has been granted control of the bus.
Interrupt request: Indicates that interrupt is pending.
Interrupt ACK: Acknowledges that the pending interrupt has been recognized.
Control Bus(Line)
Is used to control the access to and the use of the data and
address lines.
Control and timing information(indicate validity of data and address information)
Memory read/write signal
Interrupt request
Clock: Used to synchronize operations.
Reset: Initializes all modules.
The operation of any bus is as follows:
If one of the modules “wishes” to send data to another, it must do two things:
1. Obtain the use of the bus;
2. Transfer data through the bus.
If one of the modules “wishes” to receive data from the other module it must do:
1. Obtain the use of the bus;
2. Send request to the other module, by putting the corresponding code on the
address lines after formation signals on the certain control lines.
Computer systems contain a number of different buses that provide pathways
between components at various levels of the computer systems hierarchy.
A bus that connects major computer components
(CPU, Memory, I/O) is called a System Bus.
Bus Interconnection Scheme
System
Bus
CPU
Memory
.
.
.
I/O Module
Bus
Boards
Buses Hierarchy.
Single Bus Problems:
Lots of devices on one bus leads to:
Propagation delays (задержки распространения).
Long data paths mean that co-ordination of bus use can
adversely affect (неблагоприятно сказываться)
performance (dynamic characteristics become worse).
If aggregate data transfer approaches bus capacity the
system’s work may become unreliable.
Most systems use multiple buses organised by hierarchy
principle to overcome these problems
Traditional Bus Architecture (ISA –Industry Standard
Architecture) (with cache)
Up to now the Traditional Bus Architecture has been widely used. In this case the Computer
System includes Local Bus, which connects the CPU, Cache Memory and some peripheral
devices. Cache Memory Controller provides connections not only with the Local Bus, but with
the System Bus as well (all modules of the Main Memory are connected with the System Bus).
Under such structure all processes of input-output are realized through the System Bus omitting
the CPU, it allows the CPU to perform more important operations.
The connecting peripheral devices not directly to the System Bus, but to additional bus -
Expansion Bus, which buffers data circulating between the Main Memory and peripheral
devices’ controllers allows to support a large variety of external devices, and at the same time to
separate information-flows “CPU – Memory” and “ Memory – I/O Controllers”.
The appearance of new high-performance external devices demands to increase speed of data
transfer through buses, that is why one more High-Speed Bus is often used in contemporary
computer systems. This bus unites high-speed external devices and is connected with the System
Bus through special concordance module (модуль согласования) - Bridge. Such kind of
structure is called Mezzanine Architecture (Мезонинная Архитектура).
The advantage of this structure: high-speed peripheral devices are integrated with the
processor and at the same time they may work independently (themselves). It means that
functioning of the bus doesn’t depend on the CPU architecture and vice versa.
SCSI- Small Computer System Interface; LAN – Local Area Network; PI394 – Peripheral Interface (high-speed)
High Performance Bus
Bus Types
Dedicated
Separate data & address lines
Multiplexed
Shared Lines
Address valid or Data valid control lines
Advantage – fewer lines
Disadvantages
More complex control
Reduction in performance
Physically Dedicated
The use of multiple buses, each of which may connect only some
certain modules (Expansion Bus, High-Speed Bus)
Advantage – high throughput (there is less bus connection)
Disadvantage – increased size and cost of the system.
Bus Arbitration
More than one module controlling the bus
e.g. CPU and DMA controller
Only one module may control bus at one time (to be a master)
Arbitration may be centralised or distributed
Centralised
Arbitration
Single hardware device controlling bus access
Bus Controller
Arbiter
May be part of CPU or separate
Distributed Arbitration
Each module may claim the bus
Control logic on all modules
Timing
Co-ordination of events on bus
Synchronous
Events determined by clock signals
Control Bus includes clock line
A single 1-0 is a bus cycle
All devices can read clock line
Usually sync on leading edge
Usually a single cycle for an event
Asynchronous
Scheme for controlling data transfers on the bus is based on the use of handshake(квитирование) between the initiator and the target.
The clock line is replaced by two timing control lines “READY” (“MSYN”) and “ACCEPT” (“SSYN”).
Synchronous Timing Diagram
Read Operation
The CPU issues Read
signal and places memory
Address on the address bus,
issues a Start signal to
mark the presence (validity)
of the address. The memory
module recognizes the
address and after a delay of
1 bus cycle it places the
Data and Acknowledge
signal on the bus
Timing refers to the way in which events are coordinated on the bus.
Asynchronous Timing Diagram
Read Operation
The CPU places Address and
Read signals on the bus.
After pausing for the signals
to stabilize, it issues an
MSYN (master sync) signal,
indicating the presence of
valid address and control
signals. The memory module
responds with Data and
SSYN (slave) signal,
indicating the response.
With synchronous timing the occurrence of events on the bus is determined by
a clock.
The bus includes a clock line upon which a clock transmits a regular sequence of alternating 1s
and 0s of equal duration. A single 1-0 transmission is referred to as a clock cycle (bus
cycle) and defines a time slot (интервал). All other devices on the bus can read the clock
line, and all events start at the beginning of a clock cycle. Other bus signals may change at the
leading edge of the clock signal.
With asynchronous timing the occurrence of one event on a bus follows
and depends on the occurrence of a previous event.
Synchronous timing Asynchronous timing
Advantages Disadvantages Advantages Disadvantages
Simple to implement
and test
Less flexible: all devices
are tied to a fixed clock
rate
Flexible More complex to
implement and test
The system can’t take
advantage of advances in
device performance
Allow to use newer
technology; mixture of
slow and fast devices
In actual implementations, electronic switches are used. The output gate of
register is capable of being electrically disconnected from the bus or placing a
0 or a 1 on the bus. Because it supports these three possibilities, such a gate is
said to have a three—state output. A separate control input is used either to
enable the gate output to drive the bus to 0 or to 1 or to put it in a high-
impedance (electrically disconnected) state. The latter state corresponds to the
open-circuit state of a mechanical switch.
PCI Bus
Peripheral Component Interconnection, high-bandwidth, processor-independent, functions as a mezzanine or peripheral bus
Intel released to public domain
32 or 64 bit, 33 (66)MHz, a transfer rate 264(528) Mbytes/sec
50 lines
PCI Bus Lines (required)
1. Systems lines Including clock and reset
2. Address & Data 32 time lines for address/data Interrupt & validate lines
3. Interface Control Control the timing transactions and provide coordination among
initiators and targets
4. Arbitration Not shared Direct connection to PCI bus arbiter
5. Error lines
PCI Bus Lines (Optional)
Interrupt lines Not shared
Cache support
64-bit Bus Extension Additional 32 lines Time multiplexed 2 lines to enable devices to agree to use 64-bit transfer
JTAG/Boundary Scan For testing procedures
PCI Commands
Transaction between initiator (master) and target
Master claims bus
Determine type of transaction e.g. I/O read/write
Address phase One or more data phases
PCI Read Timing Diagram
All events are synchronized to the falling transitions of the
clock, which occur in the middle of each clock cycle.
The following are significant events, labeled on the diagram:
a. The master begins transaction by asserting FRAME (this
PCI signal indicates the start and duration of a transaction.
It is asserted at the start and unasserted when the initiator
(master) is ready to begin the final data phase). The master
also puts the start address on the AD (address line, which
is multiplexed and used for address and data transfer, 64
bits). On the C/BE lines (these multiplexed lines indicate
which of the four bytes lanes carry meaningful data) the
master puts the READ command.
b. At the start of clock 2, the target will recognize its address.
c. The master ceases driving the AD bus, changes the
information on the C/BE lines. To designate which AD lines
are to be used for transfer for currently addressed data. The
initiator also asserts IRDY (Initiator Ready. Driven by current
bus master. During READ operation it indicates that the
master is prepared to accept data; during a WRITE operation
it indicates that valid data is present on AD). d. The selected target asserts DEVSEL (Device Select.
Asserted by target when it has recognized its address.
Indicates to current initiator, whether any device has been
selected) to indicate that it has recognized its address, it also
places the requested data on the AD and asserts TRDY to
indicate that valid data is present
e. The initiator reads data at the beginning of clock 4 and changes the bus enable lines as needed in preparation for the next READ.
f. The target deasserts TRDY to signal the initiator that there will not be new data during the coming cycle.
g. The target places the third data item on the bus, but the initiator is not yet ready to read data item, therefore it deasserts IRDY; this will cause
the target to maintain the third data item on the bus for an extra clock cycle.
PCI Bus Arbitration
h. The initiator “knows” that the third data is the last, and so it deasserts FRAME to signal the target that this is the last data transfer, it also
asserts IRDY to signal that it is ready to complete the transfer.
i. The initiator deasserts IRDY, returning the bus to the idle state, and the target deasserts TRDY and DEVSEL.
PCI makes use of centralized, synchronous arbitration
scheme in which each master has unique request (REQ)
and grant (GNT) signals. These signal lines are attached to
a central arbiter and a simple request-grant is used to
grant access to the bus.
When two devices A and B are arbitrating for the bus,
the following sequence occurs:
a. At some point prior to the start of clock 1, A has
asserted its REQ signal. The arbiter samples this signal
at the beginning of clock cycle 1.
b. During clock cycle 1, B requests use of the bus by
asserting its REQ signal.
c. At the same time, the arbiter asserts GNT-A to grant
Bus access to A.
d. Bus master A samples GNT-A at the beginning of
clock 2 and learns that it has been granted bus access.
It also finds IRDY and TRDY unasserted, indicating
that bus is idle. Accordingly, it asserts FRAME and
places the address information on the address bus and
the command on the C/BE bus, it also continues to
assert REQ—A, because it has a second transaction to
perform after this one.
e. The bus arbiter samples all GNT lines at the beginning
of clock 3 and makes an arbitration decision to grant
the bus to B for the next transaction. It then asserts
GNT-B and deasserts GNT-A. B will not be able to
use the bus until it returns to an idle state.
f.. A deasserts FRAME to indicate that the last data transfer is in progress It puts the data on the data bus and signals the target with IRDY.
The target reds the data at the beginning of the next clock cycle.
g. At the beginning of clock 5, B finds IRDY and FRAME deasserted and so it is able to take control of the bus by asserting FRAME. It
also deasserts its REQ line because it only wants to perform one transaction .
Lecture 5 System Buses.
I. 1. Interconnections of base computer components through the bus.
2. Bus structure.
3. Bus hierarchy.
II.
1. Elements of Bus Design (Types, Methods of Arbitration, Timing).
2. PCI bus. Instructions of PCI bus. Data transaction and arbitration of PCI bus.
Literature.
1. Stallings W. Computer Organization and Architecture. Designing and performance, 5th
ed. – Upper Saddle River, NJ :
Prentice Hall, 2002.
2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL
INTERNATIONAL EDITIONS, 1996.
3. Tanenbaum, A.S. Structured Computer Organization, 4th
ed. - Upper Saddle River, NJ : Prentice Hall, 2002.
Connecting In effect, a computer is a network of basic modules (CPU, Memory, I/O), thus, there must be paths for connecting the
modules together. The way of connecting the various modules is called the interconnection structure.
All the units must be connected
Different types of connections for different types of units
Memory
Input/Output
CPU
Memory Connection
Receives and sends data
Receives addresses (of locations)
Receives control signals
Read
Write
Timing Memory
0
1
.
. N-1
Read
Write
Address
Data
Data
Input/Output Connection(1)
Similar to memory from computer’s viewpoint
Output
Receive data from computer
Send data to peripheral
Input
Receive data from peripheral
Send data to computer
Input/Output Connection(2)
Receive control signals from computer
Send control signals to peripherals
e.g. spin disk
Receive addresses from computer
e.g. port number to identify peripheral
Send interrupt signals (control)
Address
Internal Data
External Data
Internal Data
External Data
Interrupt Signals
I/O Module
M Ports
Read
Write
The interconnection structure is determined by character of exchange
operations, which are specific for each module.
CPU Connection
Reads instructions and data
Writes out data (after processing)
Sends control signals to other units
Receives (& acts on) interrupts
Instructions
Data
Interrupt Signals
Control Signals
Data
CPU
Major forms of input and output for the modules: Memory: Typically, a memory module will consists of N words of equal length. Each word is assigned a unique
numerical address (0, 1, …, N-1). A word of data can be read from or written into the memory. The nature of the
operations is indicated by READ or WRITE control signals. The location for the operation is specified by an address.
I/O Module: It’s functionally similar to the memory (from internal point of view). There are two operations READ and
WRITE. Further, an I/O module may control more than one external device. We can refer to each of the interfaces to an
external device as a port and give each a unique address (e.g., 0, 1, 2.,…, M-1). In addition, there are external data paths
for the input and output of data with an external device. Finally, an I/O module may be able to send interrupt signals to
the CPU.
CPU: CPU reads in instructions and data, writes out data after processing, and uses control signals to control the overall
operation of the system. It also receives interrupt signals.
Types of transfers supported by interconnection structure.
Memory to CPU: The CPU reads an instruction or unit of data from memory.
CPU to Memory: The CPU writes a unit of data to memory.
I/O to CPU: The CPU reads data from I/O device via an I/O module.
CPU to I/O: The CPU sends data to the I/O device.
I/O to or from the Memory: For these two cases, an I/O module is allowed to exchange data directly
with memory, without going through the CPU, using direct memory access (DMA).
Multiplexer is a functional device which permits to two or more channels of data link to use the same common device
of data transfer jointly.
Buses
There are a number of possible interconnection systems
Single and multiple BUS structures are most common
e.g. Control/Address/Data bus (PC)
e.g. Unibus (DEC-PDP)
What is a Bus? A bus is a set of electric pathways and service electronic devices (framing), providing exchange of data among computer units and
devices.
A communication pathway connecting two or more devices is a bus.
Often grouped
A number of channels in one bus
e.g. 32 bit data bus is 32 separate single bit channels
Power lines may not be shown
What do buses look like?
Parallel lines on circuit boards
Ribbon (ленточный) cables
Strip (полоса) connectors on mother boards
e.g. PCI (Peripheral Component Interconnect)
Sets of wires
A system bus consists, typically, of from 50 to 100 separate lines, which can be classified into three functional groups: data,
address and control lines (power lines are usually omitted ).
Data Bus (Line) The data lines provide a path for moving data between system modules. Number of lines is referred as WIDTH of the data bus (the number of
lines determines how many bits can be transferred at a time)
Carries data
Remember that there is no difference between “data” and “instruction” at this level!
Width of Data Bus is a key determinant of the system performance
8, 16, 32, 64 bit Address Bus (Line)
Identify the source or destination of data
(e.g. CPU needs to read an instruction (data) from a given location in memory)
Address Bus width determines maximum memory capacity of the system.
Used to address as the Main Memory, so I/O ports (the higher-order bits are used to select a particular module on the bus, and the lower-order bits select an address in the Memory or I/O port within the module).
E.g., if a width of a bus is equal to 8, then codes 01111111 and less specify cells addresses in the Main Memory module (module with 0
address), and codes from 10000000 and higher specify I/O ports which are under control of a module with an address 1.
Bus Structure
Command signals specify operations to be performed. Typical control lines include:
Memory Write: Causes data on the bus to be written into the addressed location.
Memory Read: Causes data from the addressed location to be placed on the bus.
I/O Write: Causes data on the bus to be output to the addressed I/O port.
I/O Read: Causes data from the addressed I/O port to be placed on the bus.
Transfer ACK: Indicates that data have been accepted from or placed on the bus.
Bus Request: Indicates that a module needs to gain control of the bus.
Bus Grant: Indicates that a requesting module has been granted control of the bus.
Interrupt request: Indicates that interrupt is pending.
Interrupt ACK: Acknowledges that the pending interrupt has been recognized.
Clock: Used to synchronize operations.
Reset: Initializes all modules.
The operation of any bus is as follows:
If one of the modules “wishes” to send data to another, it must do two things:
1. Obtain the use of the bus;
2. Transfer data through the bus.
If one of the modules “wishes” to receive data from the other module it must do:
1. Obtain the use of the bus;
2. Send request to the other module, by putting the corresponding code on the address lines after formation
signals on the certain control lines.
Control Bus(Line) Is used to control the access to and the use of the data and address lines.
Control and timing information(indicate validity of data and address information)
Memory read/write signal
Interrupt request
Computer systems contain a number of different buses that provide pathways between components at various levels of
the computer systems hierarchy. A bus that connects major computer components (CPU, Memory, I/O) is called a System Bus.
Bus Interconnection Scheme
System
Bus
CPU
Memory
.
.
. I/O Module
Bus
Boards
Buses Hierarchy.
Single Bus Problems:
Lots of devices on one bus leads to:
Propagation delays (задержки распространения).
Long data paths mean that co-ordination of bus use can adversely affect (неблагоприятно сказываться)
performance (dynamic characteristics become worse).
If aggregate data transfer approaches bus capacity the system’s work may become unreliable.
Most systems use multiple buses organised by hierarchy principle to overcome these problems
Traditional Bus Architecture (ISA –Industry Standard Architecture) (with cache)
Up to now the Traditional Bus Architecture has been widely used. In this case the Computer System includes Local Bus,
which connects the CPU, Cache Memory and some peripheral devices. Cache Memory Controller provides connections not
only with the Local Bus, but with the System Bus as well (all modules of the Main Memory are connected with the System
Bus). Under such structure all processes of input-output are realized through the System Bus omitting the CPU, it allows
the CPU to perform more important operations.
The connecting peripheral devices not directly to the System Bus, but to additional bus -Expansion Bus, which buffers
data circulating between the Main Memory and peripheral devices’ controllers allows to support a large variety of external
devices, and at the same time to separate information-flows “CPU – Memory” and “ Memory – I/O Controllers”.
The appearance of new high-performance external devices demands to increase speed of data transfer through buses, that
is why one more High-Speed Bus is often used in contemporary computer systems. This bus unites high-speed external
devices and is connected with the System Bus through special concordance module (модуль согласования) - Bridge. Such
kind of structure is called Mezzanine Architecture (Мезонинная Архитектура).
The advantage of this structure: high-speed peripheral devices are integrated with the processor and at the same time they
may work independently (themselves). It means that functioning of the bus doesn’t depend on the CPU architecture and vice
versa.
SCSI- Small Computer System Interface; LAN – Local Area Network; PI394 – Peripheral Interface (high-speed)
High Performance Bus
Bus Types
Dedicated
Separate data & address lines
Multiplexed
Shared Lines
Address valid or Data valid control lines
Advantage – fewer lines
Disadvantages
More complex control
Reduction in performance
Physically Dedicated
The use of multiple buses, each of which may connect only some certain modules (Expansion Bus, High-Speed
Bus)
Advantage – high throughput (there is less bus connection)
Disadvantage – increased size and cost of the system.
Bus Arbitration
More than one module controlling the bus
e.g. CPU and DMA controller
Only one module may control bus at one time (to be a master)
Arbitration may be centralised or distributed
Centralised Arbitration
Single hardware device controlling bus access
Bus Controller
Arbiter
May be part of CPU or separate
Distributed Arbitration
Each module may claim the bus
Control logic on all modules
Timing
Co-ordination of events on bus
Synchronous
Events determined by clock signals
Control Bus includes clock line
A single 1-0 is a bus cycle
All devices can read clock line
Usually sync on leading edge
Usually a single cycle for an event
Asynchronous
Scheme for controlling data transfers on the bus is based on the use of handshake(квитирование)
between the initiator and the target.
The clock line is replaced by two timing control lines “READY” (“MSYN”) and “ACCEPT”
Synchronous Timing Diagram
Read Operation The CPU issues Read signal
and places memory Address
on the address bus, issues a
Start signal to mark the
presence (validity) of the
address. The memory module
recognizes the address and
after a delay of 1 bus cycle it
places the Data and
Acknowledge signal on the
bus
Asynchronous Timing Diagram
Read Operation The CPU places Address and
Read signals on the bus. After
pausing for the signals to stabilize,
it issues an MSYN (master sync)
signal, indicating the presence of
valid address and control signals.
The memory module responds with
Data and SSYN (slave) signal,
indicating the response.
Timing refers to the way in which events are coordinated on the bus.
With synchronous timing the occurrence of events on the bus is determined by a clock.
The bus includes a clock line upon which a clock transmits a regular sequence of alternating 1s and 0s of equal duration. A
single 1-0 transmission is referred to as a clock cycle (bus cycle) and defines a time slot (интервал). All other devices
on the bus can read the clock line, and all events start at the beginning of a clock cycle. Other bus signals may change at the
leading edge of the clock signal.
With asynchronous timing the occurrence of one event on a bus follows and depends on the occurrence of a
previous event.
Synchronous timing Asynchronous timing
Advantages Disadvantages Advantages Disadvantages
Simple to implement
and test
Less flexible: all devices
are tied to a fixed clock
rate
Flexible More complex to
implement and test
The system can’t take
advantage of advances in
device performance
Allow to use newer
technology; mixture of
slow and fast devices
In actual implementations, electronic switches are used. The output gate of register is capable of being electrically
disconnected from the bus or placing a 0 or a 1 on the bus. Because it supports these three possibilities, such a gate is said to
have a three—state output. A separate control input is used either to enable the gate output to drive the bus to 0 or to 1 or to
put it in a high-impedance (electrically disconnected) state. The latter state corresponds to the open-circuit state of a
mechanical switch.
PCI Bus
Peripheral Component Interconnection, high-bandwidth, processor-independent, functions as a mezzanine or peripheral bus
Intel released to public domain
32 or 64 bit, 33 (66)MHz, a transfer rate 264(528) Mbytes/sec
50 lines
PCI Commands
PCI Bus Lines (required)
1. Systems lines
Including clock and reset
2. Address & Data
32 time lines for address/data
Interrupt & validate lines
3. Interface Control
Control the timing transactions
and provide coordination among
initiators and targets
4. Arbitration
Not shared
Direct connection to PCI bus
arbiter
5. Error lines
PCI Bus Lines (Optional)
Interrupt lines
Not shared
Cache support
64-bit Bus Extension
Additional 32 lines
Time multiplexed
2 lines to enable
devices to agree to use
64-bit transfer
JTAG/Boundary Scan
For testing procedures
Transaction between initiator (master) and target
Master claims bus
Determine type of transaction
e.g. I/O read/write
Address phase
One or more data phases
Synchronous Bus (SB) On a SB all devices derive timing information from a common clock line.
Equally spaced pulses on this line define equal time intervals; each interval constitutes a bus cycle, during which one
data transfer can take place.
Such a scheme is illustrated below. In the scheme the address and data lines are shown as high and low at the same time
(this indicates that some lines are high and some low, depending on the particular address or data pattern being transmitted).
The crossing points indicate the times at which these patterns change.
A signal line in an indeterminate state (or high impedance state) is represented by an intermediate level halfway between
the low and high signal levels.
The sequence of events during an input (read) operation.
At time t0 the processor places the device address on the address lines and sets the mode control lines to indicate an input
operation. This information travels over the bus at a speed determined by its physical and electrical characteristics. The
clock pulse width t1 –t0 should be chosen such, that it is greater than maximum propagation delay between the CPU and any
devices connected to the bus. It should also be wide enough to allow all devices to decode the address and control signals so
that the addressed device can be ready to respond at time t1. The addressed device, recognizing, that an input operation is
requested, places its input data on the data lines at time t1 . At the end of the clock cycle, that is at time t2, the CPU
strobes the data lines and loads the data into its input buffer (here, “strobe” means to determine the value of the data at a
given instant). For data to be loaded correctly into a storage device, the data must be available at the input of that device for
a period greater than the setup time of the device. Hence, the period t2 –t1 must be greater than the maximum propagation
time on the bus plus the setup time of the input buffer register of the CPU.
The procedure for an output operation is similar to that for the input sequence. The processor places the output data on
the data lines when it transmits the address and the mode information. At time t1, the addressed device strobes the data lines
and loads the data into its data buffer.
The synchronous bus scheme is simple and results in a simple design for the device interface. The clock speed must be
chosen such that it accommodates the longest delays on the bus and the slowest interface. Note, that the CPU has no way of
determining whether the addressed device has actually responded. It simply assumes that at t2 the output data have been
received by I/O device or the input data are available on the data lines; if, because of malfunction, the device does not
respond, the error will not be detected.
Asynchronous Bus (AsB)
An alternative scheme for controlling data transfers on the bus is based on the
use of handshake between the processor and the device being addressed. The
common clock is eliminated; hence, the resulting bus operation is
asynchronous. The clock line is replaced by two timing control lines, which we
refer as Ready and Accept. In principle, a data transfer controlled by a
handshake protocol proceeds as follows:
The processor places the address and mode information on bus;
Then it indicates to all devices that it has done so by activating the Ready
line;
When the addressed device receives the Ready signal, it performs the
required operation;
After this the addressed device informs the processor it has done so by
activating the Accept line;
The processor waits for the Accept signal before it removes its signals
from the bus (in the case of a read operation, it also strobes data into its
input buffer).
Bus cycle
Timing of an input transfer on a synchronous bus
Voltage
t
Threshold
Forbidden range Logic value 1
Logic value 0
Bus clock
Voltage
t
Address and mode
information
Voltage
t
Data
t0
t2
t1
Data
t0 t1 t2 t3 t4 t5
Handshake control of data transfer during an input operation
Voltage
t
Address and mode
information
Voltage
t
Voltage
t
t0 t1 t2 t3 t4 t5
Voltage
t t0 t1 t2 t3 t4 t5
Ready
Accept
t0 t1 t2 t3 t4 t5
When the device interface receives the 1 to 0 transition of the Ready signal, it removes the data and the Accept signal from the bus. This completes the input transfer.
The Accept signal arrives at the processor, indicating, that the input data are available on the bus. however, since it was assumed, that the device interface transmits Accept signal at the same time that it places the data on the bus, the CPU must allow for bus skew. After the delay equivalent to the maximum bus skew, the CPU strobes the data into its input buffer. at the same time it drops the signal Ready, indicating, that it has receive the data.
The interface of the addressed device sets the Accept signal to 1 and gates the data from the data its register to the data lines. If extra delays are introduced by the interface circuitry before it places the data on the bus, it must delay the Accept signal accordingly. the period t2 - t1 depends on the distance between CPU and the device interface; it is also a function of the delays introduced by interface circuitry.
Data
The delay t1 - t0 is intended to allow for any skew that may occur on the bus . Sufficient time
should be allowed for the interface circuitry to decode the signal; this delay must be also
included in the period t1 - t0
The processor sets the Ready line to 1 to inform I/O unit that the address and mode information is ready
t0 t1 t2 t3 t4 t5
Voltage
t
Address and mode
information
Voltage
t
Voltage
t
t0 t1 t2 t3 t4 t5
Voltage
t t0 t1 t2 t3 t4 t5
Ready
Accept
t0 t1 t2 t3 t4 t5
t3
t4
t5
t1
t0
t2
The CPU removes the address and mode information from
the bus. The delay t4 - t3 is again intended to allow for bus
skew. Erroneous addressing may take place if the address, as seen by some device on the bus, starts to change, while the Ready signal is still equal to 1.
The processor places the address and mode of information on the bus
In this diagram it is assumed, that compensation for bus skew and address
decoding is performed by the CPU. This simplifies the I/O interface at the
device end, because the interface circuit can use the Ready signal directly
to gate other signals to or from the bus.
Skew occurs when two signals simultaneously transmitted from one source
arrive at the destination at different times. This happens because different lines
of the bus may have different propagation speeds. thus, to guarantee that Ready
signal does not arrive at any device ahead of the address and mode information
the delay - should be larger than the maximum possible skew of the bus (Note
that, in synchronous case, bus skew is accounted for as a part of the maximum
propagation delay).
Mixed Synchronous/Asynchronous Bus (MS/AsB)
Another practical alternative is to use an asynchronous bus, but with a
provision that all signaling changes are synchronized with a clock. The time
elapsed between successive handshake signals is an integral number of clock
cycles. For example, the CPU may send an address during one clock cycle.
During that cycle, it asserts a signal indicating that the address is valid and that
all devices on the bus should decode this address. The addressed device
responds, when it is ready, by asserting an acknowledge signal and placing the
data on the bus (in the case of read operation). One or more clock cycles may
separate the request and the response, depending on the speed of the device
being addressed. Using the clock often leads to simpler designs of logic circuits
in device interfaces.
Many variations of the bus techniques are found in commercial computers.
For example, the bus in the 68000 family of processors has two modes of
operation, one asynchronous and one synchronous. The PowerPC bus uses the
mixed approach.
The choice of design involves trade-offs among many factors. Some of the
important considerations are:
Simplicity of the device interface;
Ability to accommodate device interfaces that introduce different amount
of delay;
Total time required for a bus transfer;
Ability to detect errors resulting from addressing a nonexistent device or
from an interface malfunction.
The fully asynchronous scheme provides the highest degree of flexibility and
reliability, but its device interface circuit is somewhat more complex than that
of the synchronous or mixed bus. Asynchronous buses have an error-detecting
capability provided by interlocking the Ready and Accept signals. If the Accept
signal is not received within a fixed time-out period after Ready is set to 1, the
CPU assumes that an error has occurred. A bus error can be used to cause an
interrupt and execute a routine that either alerts the operator to the malfunction
or takes some other appropriate action.
Types of Operations of Data Transfer. Bus may support the following types of operations:
Read (data transfer from master to slave);
Write (data transfer from slave to master);
Read-Modify-Write (for the executing Write operation there is no necessity
to change an address on the address lines);
Read-after-Write (it also executed without changing address for the Write
operation).
(Multiplexed lines) (Dedicated (Separate) lines)
Write
Write
Read
Read
Address (1-st Cycle)
Data
(2-nd Cycle)
Address
(1-st Cycle)
Access Time
(2-nd Cycle)
Data (3-rd Cycle)
Address
1-st Cycle
Time Time
Address
1-st Cycle
Data
Data
Read-Modify-Write Read-after-Write
Address
Access
Time
Data Read
Data Write
Address
Data Write
Access
Time
Data Read
Address
Data
Data Data
Data-block Transfer
Cache
Memory
CPU
PCI
Bridge
Main
Memory
Sound
Board Video
Board
Graphic
Board
Concordance
Module of Ex-tended Bus
Basic I/O devices
LAN Adapter
SCSI Interface
CCoonnffiigguurraattiioonn ooff CCoommppuutteerr SSyysstteemm oonn tthhee bbaassee ooff PPCCII BBuuss..
PCI bus
Cache Memory bus Local bus
Main Memory bus
ISA bus
Main Characteristics of PCI Bus Introduced in ’90 by Intel. Improved by consortium of
manufacturers (PCI SIG (Special Interest Group)).
Uses 66 MHz clock, independent from that of the processor;
Capacity 528 Mbytes/s;
Cycle time 30 Ns;
64-bit data and address lines (multiplexed);
Supports up to 16 slots;
Systems also include ISA slots for compatibility;
Mixed synchronous/asynchronous bus;
Centralized bus architecture
Structure of PCI Bus Lines. (required)
There are 5 groups of lines:
System Lines. Through these lines timing signals and signals of initial
setting are transferred. Namely:
CLK - Clock ticks (loops), in accordance with the left
(increasing) front all processes on the bus are synchronized.
The frequency 33 (66) MHz;
RST# - Reset of all registers, counters and potential signals
(return into the initial state).
Informational Lines. Through 32 (64) lines of these group code-
signals of addresses and data are transferred, the others are used for
interpretation and acknowledgement of data validity. Namely:
AD- Multiplexed lines, which are used for addresses and data
transfer;
C/BE- Multiplexed Lines of Bus commands and signals of
bytes selection. During the phase of data transfer signals on
these lines indicate, which of the bytes (4 bytes are
transferred at the same time) contain the necessary
information;
PAR- Signal of parity checking of data on the lines AD and
C/BE with a delay equals to one cycle.
Interface Managing Lines. Through these lines signals, which
guarantee coordination of master and slave work during an exchange
process, are transferred. Namely:
FRAME# - The current master asserts a signal on this line in
order to inform other devices about starting of a transaction.
The master unlocks this signal at a moment, when the
completing phase of the transaction has begun;
IRDY#- Initiator (Master) Ready. A signal on this line is
formed by the current initiator (master). When an operation
Read is performed this signal is an indicator of master
readiness to accept the data. When an operation Write is
performed this signal is an indicator of data validity (the data
asserted on the AD lines).
TRDY# - Target Ready. A signal on this line is formed by
the current slave (target). When an operation Read is
performed this signal is an indicator of the data validity on
the lines AD. When an operation Write is performed this
signal is an indicator of slave readiness to accept the data.
STOP - The signal on this line is formed by the current slave
and informs the master, that a situation for necessity to stop
the current transaction has appeared.
IDSEL – Initialization Device Select. This line is used for
selecting a chip during Read or Write operations in the
configuration process (any device, attached to the PCI has
got 256 internal registers; states of these registers determine
the current configuration of slaves).
DEVSEL – Device Select. This line is used by a slave, when
it has recognized its own address on the phase of address
code transfer through AD lines, and for the master it serves
as an indicator of slave determining.
Lines of Arbitration. Unlike the other lines, these lines are used by
separate modules individually: for each of the modules, attached to the
PCI bus a pair of lines is allotted (these two lines are directly
connected with the bus arbiter).Namely:
REQ# - By using this line some of the devices inquire the
bus arbiter about permission to use the bus.
GNT# - By signals on this line the arbiter informs a device,
which has made the request, that it has obtained a grant to
use the bus.
Lines of Errors Indication. Through these lines signals of appeared
errors are transferred. Namely:
PERR# - Parity Error. Signals on this line inform, that the
control system has detected a parity error.
SERR# - System Error. Any device may assert a signal on
this line, and it informs about detecting error (parity error
during an address analysis or some other errors during data
codes analysis).
Instructions of PCI Bus.
Functioning of PCI Bus can be presented as a sequential executing of
transactions. Transaction is a seance of data portion transfer. Any
transaction is initialized by a master and supported by a slave.
When a master begins a transaction, it sets an instruction on the C/BE
line in the phase of address transfer.
The PCI standard specifies the following instructions:
Interrupt Acknowledge. It is one of READ types of instructions,
which is envisaged for PCI’s controllers (during an address transfer the
code of the instruction isn’t asserted, but during data transfer phase on the
BE lines a code is asserted, which indicates the size of the required
interrupt identifier);
Special cycle. This instruction points out, that the master “wishes” to
send a message to one or to several slaves;
I/O READ and I/O WRITE. These instructions are used for data
transfer between the master and the selected device of I/O;
Memory READ and Memory WRITE. These instructions are used for
transferring data between the master and the maim memory.
Configuration READ and Configuration WRITE. These instructions
allow to the master to read the information, which concerns the current
slave configuration and renew its parameters, if it is necessary;
Transaction of PCI Bus
T1
T2
T3
T4
T5
T6
T7
READ Empty Cycle WRITE
ADDRESS
“Reverse” DATA
ADDRESS
DATA
READ CD Permission
WRITE CD
Permission
CLK
AD
C/BE#
FRAME#
IRDY#
DEVSEL#
TRDY#
During the T1 Cycle (on the back front):
1. Master asserts address on AD line;
2. Master asserts READ command on
the line C/BE;
3. Master asserts signal (start of
transaction) on the line FRAME#;
During the T2 cycle (on the back front):
1. Master switches AD line in order
the Slave would be able to use it
during the next cycle;
2. Master changes the contents of
C/BE# lines, pointing out which of
the bytes in the word it will read;
During the T3 cycle (on the back front):
1. Slave asserts a signal
(confirmation to the master, that it
has recognized the address and it is
ready to respond) on the line
DEVSEL#;
2. Slave asserts data on AD line;
3. Slave asserts signal (confirmation
of the master, that the necessary
data have been asserted) on the
TRDY#;
The cycle T4 in reality is usually
empty;
During the cycle T5 the same chip
initializes WRITE operation (so, the
cycle T5 is identical to the T1 one);
During the T6 cycle the Master itself
places data on AD lines (there is no
necessity to use a “reverse cycle”);
During the T7 cycle the memory
accepts data.
Data
t0 t1 t2 t3 t4 t5
Timing of data transfer on an asynchronous bus
Voltage
t
Address and
operation
mode
Voltage
t
Voltage
t
t0 t1 t2 t3 t4 t5
Voltage
t t0 t1 t2 t3 t4 t5
Ready
Accept
t0 t1 t2 t3 t4 t5
Bus cycle
Timing of data transfer on a synchronous bus
Voltage
t
Threshold
Forbidden range Logic value 1
Logic value 0
Bus clock
Voltage
t
Address and operation
mode
Voltage
t
Data
t0
t2
t1
Timing of data transfer on a mixed bus
PCI Bus Arbitration
Main Characteristics of PCI Bus
Introduced in ’90 by Intel. Improved by consortium of manufacturers (PCI SIG (Special Interest Group)).
Uses 66 MHz clock, independent from that of the processor;
Capacity 528 Mbytes/s;
Cycle time 30 Ns;
64-bit data and address lines (multiplexed);
Supports up to 16 slots;
Systems also include ISA slots for compatibility;
Mixed synchronous/asynchronous bus;
Centralized bus architecture
Structure of PCI Bus Lines.
(required) There are 5 groups of lines: System Lines. Through these lines timing signals and signals
of initial setting are transferred. Namely: CLK - Clock ticks (loops), in accordance with the left
(increasing) front all processes on the bus are synchronized. The frequency 33 (66) MHz;
RST# - Reset of all registers, counters and potential signals (return into the initial state).
Informational Lines. Through 32 (64) lines of these group code-signals of addresses and data are transferred, the others are used for interpretation and acknowledgement of data validity. Namely:
AD- Multiplexed lines, which are used for addresses and data transfer;
C/BE- Multiplexed Lines of Bus commands and signals of bytes selection. During the phase of data transfer signals on these lines indicate, which of the bytes (4 bytes are transferred at the same time) contain the necessary information;
PAR- Signal of parity checking of data on the lines AD and C/BE with a delay equals to one cycle.
Interface Managing Lines. Through these lines signals, which guarantee coordination of master and slave work during an exchange process, are transferred. Namely:
FRAME# - The current master asserts a signal on this line in order to inform other devices about starting of a transaction. The master unlocks this signal at a moment, when the completing phase of the transaction has begun;
IRDY#- Initiator (Master) Ready. A signal on this line is formed by the current initiator (master). When an operation Read is performed this signal is an indicator of master readiness to accept the data. When an operation Write is performed this signal is an indicator of data validity (the data asserted on the AD lines).
TRDY# - Target Ready. A signal on this line is formed by the current slave (target). When an operation Read is performed this signal is an indicator of the data validity on the lines AD. When an operation Write is performed this signal is an indicator of slave readiness to accept the data.
STOP - The signal on this line is formed by the current slave and informs the master, that a situation for necessity to stop the current transaction has appeared.
IDSEL – Initialization Device Select. This line is used for selecting a chip during Read or Write operations in the configuration process (any device, attached to the PCI has got 256 internal registers; states of these registers determine the current configuration of slaves).
DEVSEL – Device Select. This line is used by a slave, when it has recognized its own address on the phase of address code transfer through AD lines, and for the master it serves as an indicator of slave determining.
Lines of Arbitration. Unlike the other lines, these lines are used by separate modules individually: for each of the modules, attached to the PCI bus a pair of lines is allotted (these two lines are directly connected with the bus arbiter).Namely:
REQ# - By using this line some of the devices inquire the bus arbiter about permission to use the bus.
GNT# - By signals on this line the arbiter informs a device, which has made the request, that it has obtained a grant to use the bus.
Lines of Errors Indication. Through these lines signals of appeared errors are transferred. Namely:
PERR# - Parity Error. Signals on this line inform, that the control system has detected a parity error.
SERR# - System Error. Any device may assert a signal on this line, and it informs about detecting error (parity error during an address analysis or some other errors during data codes analysis).
Instructions of PCI Bus. Functioning of PCI Bus can be presented as a sequential executing of transactions. Transaction is a seance of data portion transfer. Any transaction is initialized by a master and supported by a slave. When a master begins a transaction, it sets an instruction on the C/BE line in the phase of address transfer. The PCI standard specifies the following instructions:
Interrupt Acknowledge. It is one of READ types of instructions, which is envisaged for PCI’s controllers (during an address transfer the code of the instruction isn’t asserted, but during data transfer phase on the BE lines a code is asserted, which indicates the size of the required interrupt identifier);
Special cycle. This instruction points out, that the master “wishes” to send a message to one or to several slaves;
I/O READ and I/O WRITE. These instructions are used for data transfer between the master and the selected device of I/O;
Memory READ and Memory WRITE. These instructions are used for transferring data between the master and the maim memory.
Configuration READ and Configuration WRITE. These instructions allow to the master to read the information, which concerns the current slave configuration and renew its parameters, if it is necessary;
Questions to Lecture 5.
1. What is the interconnection structure, and by which factors is
it determined?
2. List the types of exchanges (input and output) that are
characteristically for each module, draw up a sketch for the
CPU module (indicate the major forms of input and output)
and explain from which modules the CPU receives data (What
kind of operations are specific for the CPU module?).
3. What kind of buses does the System Bus include? What
function does each of these buses carry out?
4. What do we call the width of a bus? Which parameters of the
Computer System are determined by widths of some buses
included in the System Bus?
5. What operation does the control signal “I/O read” set?
6. What problems may arise, when only one (single) bus is used
in a computer system?
7. Give examples of using multiple bus structures in computer
systems and explain necessity of including each of the buses
in the system.
8. List and describe main generic types of buses.
9. Which methods of arbitration are used now? What’s the
difference between these methods?
Centralized Arbitration
Single-Level Centralized Arbitration
Inquest of the First Level Bus
1
2
3
4
5
6
Arbiter
Inquest of the Second Level Bus
Grant of the First Level Bus
Grant of the Second Level Bus
1
2
3
4
5
6
Arbiter
Inquest of the Bus
Grant of the Bus
Two-Levels Centralized Arbitration
Distributed Arbitration
Arbitration of PCI Bus.
in out
1
in out
2
in out
3
in out
4
in out
5
in out
6
5 v
Arbitration Line
Busy
Inquest of the Bus
REQ# GNT#
Controller of
Device
……..
ARBITER
REQ# GNT#
Controller of
Device
REQ# GNT#
Controller of
Device
REQ# GNT#
Controller of
Device
Lecture № 6
Memory Subsystem. Internal Memory.
1. Functions and characteristics of Memory subsystem.
2. Semiconductor memories: RAM ( DRAM & SRAM), ROM.
3. Internal organization of Memory Chips.
Literature.
1. Stallings W. Computer Organization and Architecture. Designing and
performance, 5th ed. – Upper Saddle River, NJ : Prentice Hall, 2002.
2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer
organization,4th
ed. – McGRAW-HILL INTERNATIONAL EDITIONS, 1996.
3. Tanenbaum, A.S. Structured Computer Organization, 4th
ed. - Upper Saddle
River, NJ : Prentice Hall, 2002.
Memory is a functional part of Computer System which is
intended for accepting, storing and presentation of data.
Location
Capacity
Unit of transfer
Access method
Characteristics
Performance
Physical type
Physical characteristics
Organisation
Location
CPU (Local - Registers )
Internal ( Main & Cache)
External (the external memory devices are such devices for the CPU access to which it is possible only through corresponding I/O module)
Capacity
Word size
The natural unit - word
Number of words(for external devices)
or Bytes
Unit of Transfer
Internal
Usually governed by data bus width and measured in words: 8, 16 or 32 bits
External
Usually a block which is much larger than a word and estimated in bytes or bits (designated as N)
Addressable unit
Smallest location which can be uniquely addressed
Word internally
Cluster on some external devices.
These methods are used for the access to
external devices
Access Methods (1)
Sequential (The stored data and additional address information are divided into elements, called records)
Start at the current position and read through in order
Access time depends on location of data and previous location
e.g. tape
Direct
Individual blocks have unique address
Access is by jumping to vicinity(окрестность) plus sequential search
Access time depends on location and previous location
e.g. disk
In both types (direct and sequential) of accesses a combined mechanism of Read/Write is used
These methods are used for the access to
internal devices
Access Methods (2)
Random
Individual addresses identify locations exactly
Access time is independent of location or previous access
e.g. RAM
Associative
Data is located by a comparison with contents of a portion of the store
Access time is independent of location or previous access
e.g. cache Performance
Access time
Time between presenting the address and getting the valid data (for random access).
Time, which is necessary for the transference of the Read/Write mechanism in the required position with respect to the carrier (for direct and sequential accesses).
Memory Cycle time (TC)
Time may be required for the memory to “recover” before next access
Cycle time is access + recovery
Transfer Rate (R[bit/sec])
for direct and sequential accesses: R = N/(TN – TA) ,
Here TN is a time for read or write operation of a block of data of N bits volume,
TA is an average access time
Physical Types
Semiconductor
RAM
Magnetic
Disk & Tape
Optical
CD & DVD
Others
Hologram
Physical Characteristics
Decay(разрушаемость) Volatility(энергозависимость) Erasable Power consumption
Organisation
Physical arrangement of bits into words
Not always obvious
e.g. interleaved (расслоенная, с чередованием адресов)
There are relationships among these three characteristics:
Smaller access time – greater cost per bit
Greater capacity – smaller cost per bit
Greater capacity – greater access time
To meet performance requirements it is necessary to use
expensive, relatively lower-capacity memories with fast access
time.
The way out of this dilemma is not to rely on a single memory
component or technology, but to employ a memory hierarchy
Memory Hierarchy
Registers
In CPU
Internal or Main memory
May include one or more levels of cache
“RAM”
External memory
Backing store
The Bottom Line
How much?
Capacity
How fast?
Time is money
How expensive?
Internal Memory
Registers
Cache
Main Memory
External Memory Magnetic Disk
CD-R
CD-ROM
Off-line Storage
Magnetic Tape Storage
Magnetic Optical Disks
Optical Disks
Table: The Memory Hierarchy
As one goes down the hierarchy, the following occur:
a. Decreasing cost/bit
b. Increasing capacity
c. Increasing access time
d. Decreasing frequency of access of the Memory by the CPU
Thus: smaller, more expensive, faster memories are
supplemented by larger, cheaper, slower memories
Performance of a simple two—level memory
Percentage of
Access (involving
only Level 1) [%]
Average Access Time [ms]
100
10
The basis for the validity of condition d is a principle
known as locality of reference. During the course of
execution of a program, memory references by the
processor, for both instructions and data, tend to
cluster. Over a long period of time the CPU is
primarily working with these clusters of memory
references. It is possible to organize data across the
hierarchy such, that the percentage of accesses to
each succeedingly lower level is substantially less
than to the level above.
Other forms of memory may be included in the
hierarchy (Expanded Storage (intermediate storage), Disk
Cache).
Cycle times of semiconductor memories range from a few
hundred nanoseconds to less than 10 nanoseconds.
Memory unit is called RAM if any location can be
accessed for Read or Write operation in some fixed
amount of time that is independent of the location’s
address
Semiconductor Memory
RAM, ROM, PROM, EPROM,
Flash Memory, EEPROM, CMOS
RAM
Read/Write at an arbitrary address (at random)
Volatile
Temporary storage
Static or dynamic
Dynamic RAM
Bits stored as a charge in capacitors
Charges leak
Need refreshing even when powered
Simpler construction
Smaller per bit
Less expensive
Need refresh circuits
Slower
Main memory
Static RAM
Bits stored as on/off switches (using traditional flip-flop logic gate configurations)
No charges to leak
No refreshing needed when powered
More complex construction
Larger per bit
More expensive
Does not need refresh circuits Faster
Cache
Read Only Memory (ROM)
Permanent storage
Micro-programming
Library subroutines
Systems programs (BIOS)
Function tables
CMOS – Complementary Metal-Oxide Semiconductor. CMOS is
intended for storing the Computer current configuration. It stores data
practically without using energy.
Memory cells are usually organized in the form of an array, in
which each cell is capable of storing one bit of information.
For semiconductor memories one of the key
design issues is the number of bits of data that
may be read/written at a time.
At one extreme is an organization in which the physical arrangement of
cells in the array is the same as the logical arrangement (as perceived by the
processor) of words in the memory: the array is organized into W words of
B bits each and B bits are read/written at a time.
Types of ROM
Written during manufacture ROM, very expensive for small runs
Programmable (once) PROM Needs special equipment to program (programmer)
Read “mostly” Erasable Programmable (EPROM)
Erased by UV (all the storage) Electrically Erasable (EEPROM)
Takes much longer to write than read Flash memory (intermediate between EPROM and EEPROM)
Erase memory electrically
Organisation in detail
At the other extreme is the so-called “one-bit-per-chip” organization, in
which data read/written one bit at a time.
A 16Mbit chip can be organised as 1M of 16 bit words
A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip 1 and so on
A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array
Reduces number of address pins Multiplex row address and column address 11 pins to address (211=2048) Adding one more pin doubles range of values so
x4 capacity, so far, we have gone through the following generations, at rate of roughly one every three years: 1K, 4K, 16K, …, 16M.
Refreshing
Refresh circuit included on chip
Disable chip
Count through rows
Read & Write back
Takes time
Slows down apparent performance
Logically, the memory array is organized as 4 square arrays of
2048 by 2048 elements. The elements of the array are connected
by horizontal (row) and vertical (column) lines. Each horizontal
line connects to the Select terminal of each cell in its row, and
each vertical line connects to the Data-In/Sense terminal of each
cell in its column.
Address lines supply address of the word to be selected. In this
example 11 address lines are used to select one of 2048 rows;
Typical 16 Mb DRAM (4M x 4)
additional 11 address lines select one of 2048 columns. Four
data lines are used for input and output of 4 bits to and from a
data buffer. The row line selects which row of cells is used for
reading or writing. Since only 4 bits are read/written to this
DRAM, there must be multiple DRAMs connected to the
memory controller in order to read/write a word of data to the
bus.
Integrated circuits are mounted on packages of DIP (dual in-line
package) type: pins are located in 2 rows(lines). The number of pins is
usually less or equal to 32.
Packaging
Fig. (a) shows an example EPROM package (8-Mbit chip). It
is “one-word-per-chip” package. It includes 32 pins, which
support following signal lines:
(A0 – A19) the address of the word being accessed;
(D0 –D7)the data to be read out, consisting of 8 lines;
Vcc the power supply;
Vss the ground pin;
CE a chip enable;
Vpp a program voltage that is supplied during programming.
Fig. (b) shows an example DRAM pin package (16-Mbit
chip organized as 4M . 4.
Since RAM can be updated, the data pins are input/output.
The write enable (WE) and output enable (OE) pins indicate
whether this is write or read operation.
RAS means row address select, and CAS – column address
select.
Module Organisation(1)
If RAM chip contains only 1 bit per word, then we will need a
number of chips equal to number of bits per word. Fig.
Module Organisation (1) shows how a memory module
consisting of 256K 8—bit words could be organised. For 256K
words an 18-bit address is needed and is supplied to the module
from some external source. The address is presented to 8 chips,
each of which provides the input/output of 1 bit.
When the larger memory is required, an array of chips is
needed. The possible organization of 1Mbyte memory is shown
in Fig. Module Organisation (2). In this case, we have 4
columns of chip, each column contains 256K words. 20 address
lines are needed. The 18 of them are routed to 32 modules. The
other 2 are input to a group select logic module, that sends a chip
enable signal to one of 4 columns of modules.
If RAM
Module Organisation (2)
1
1
1 0
1
1
1 0
1
1
0 0
1
1
0 0
Hamming’s Correcting Control Code Formation
0
0
0
0
0
0
1
1
1
a) b)
c) d)
A A
A A B B
B B
C
C C
C
Questions to Lecture 6
1. Describe existed methods of access to different types of
memory.
2. Which parameters are used for the estimation memory devices
performance? What does each of these parameters
characterize?
3. Explain the necessity of a memory hierarchy employment.
4. What is RAM? Describe distinguishing characteristics of
RAM. What’s the difference between DRAM and SRAM?
5. What is ROM?
6. Explain the necessity of implementation of EPROM
(EEROM, Flash Memory).
Error detecting and correcting with help of Correcting Codes.
Memory
Comparison
f
f
Corrector
M M
M
K K
K
Error Signal
Output Data
Input Data
Lecture № 7
CACHE MEMORY 1. Purpose and principles of work. Elements of Cache Design.
2. Mapping Function. Direct, associative and set associative techniques.
3. Cache organization in PENTIUM & PowerPC processors.
Literature.
1. Stallings W. Computer Organization and Architecture. Designing and performance, 5
th ed. – Upper Saddle River, NJ :
Prentice Hall, 2002.
2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL
INTERNATIONAL EDITIONS, 1996.
3. Tanenbaum, A.S. Structured Computer Organization, 4th
ed. - Upper Saddle River, NJ : Prentice Hall, 2002.
Cache memory is intended to give memory speed approaching that of the fastest memories available, and at the same time provide this fast memory the price of less expensive types of semiconductor memories.
Cache
Small amount of fast memory
Sits between normal main memory and CPU (off-chip cache)
May be located on CPU chip or module (on-chip cache)
Cache operation - overview
CPU requests contents of memory location
Check cache for this data If present, get from cache (fast) If not present, read required block from main
memory to cache Then deliver from cache to CPU. Cache includes
tags to identify which block of main memory is in each cache slot
Cache Design
Size (more optimal size: between 1K and 512K) Mapping Function (direct, associative, set associative) Replacement Algorithm (LRU, FIFO, LFU, Random) Write Policy(Information integrity)(Write through, Write back) Block Size (no definitive optimum value has been found) Number of Caches (Single- or two-level, Unified or Split)
Size does matter
Cost More cache is expensive
Speed Large cache is slightly slower than small one Checking cache for data takes time
The Cache Efficiency is characterized by hit ratio. The hit ratio is a ratio of
all hits in the cache to the number of CPU’s accesses to the memory.
Typical Cache Organization
Mapping Function
Cache of 64kByte Cache slot of 4 bytes
i.e. cache is 16k (214) lines(slots) of 4 bytes
16MBytes main memory 24 bit address
(224=16M)
Direct Mapping
Each block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific
place
Address is in two parts Least Significant w bits identify unique word
Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a
tag of s-r (most significant)
Direct Mapping Address Structure
Tag s-r Line or Slot
r Word w
8 14 2
24 bit address
the low-order 2 bits select one of 4 words in 4 byte block 22 bit block identifier
8 bit tag (=22-14) (the high-order 8 bits of the memory address of the block are stored in 8 tag bits associated with its location in the cache )
14 bit slot or line (determines the cache position in this block)
No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag
Direct Mapping Cache Line Table
Cache line Main Memory blocks held 0 0, m, 2m, … 2s-m 1 1, m+1, 2m+1 … 2s-m+1 m-1 m-1, 2m-1,3m-1 … 2s-1
Direct Mapping Cache Organization
Direct Mapping Example
Direct Mapping: advantages &
disadvantages
Simple
Inexpensive Fixed location for given block
If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high
Associative Mapping
A main memory block can load into any line of cache
Memory address is interpreted as tag and word
Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive
Associative Mapping Example
Tag 22 bit Word
2 bit
Associative Mapping Address Structure
22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit
word is required from 32 bit data block
e.g. Address Tag Data Cache line FFFFFC FFFFFC 24682468 3FFF
Set Associative Mapping
Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set
e.g. Block B can be in any line of set i
e.g. 2 lines per set 2 way associative mapping
A given block can be in one of 2 lines in only one set
Set Associative Mapping Example
13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set
Two Way Set Associative Cache Organization
Set Associative Mapping Address Structure
Use set field to determine cache set to look in Compare tag field to see if we have a hit
e.g Address Tag Data Set number 1FF 7FFC 1FF 12345678 1FFF 001 7FFC 001 11223344 1FFF
Tag 9
bit Set 13
bit
Word
2 bit
Two Way Set Associative Mapping Example
Replacement Algorithms (1)
Direct mapping
No choice Each block only maps to one line Replace that line
Replacement Algorithms (2)
Associative & Set Associative
Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative
Which of the 2 block is LRU?
First in first out (FIFO) replace block that has been in cache longest
Least frequently used replace block which has had fewest hits
Random
Write Policy
Must not overwrite a cache block unless main memory is up to date
Multiple CPUs may have individual caches I/O may address main memory directly
Write through
All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic
to keep local (to CPU) cache up to date Lots of traffic Slows down writes
Write back
Updates initially made in cache only Update bit for cache slot is set when update
occurs If block is to be replaced, write to main memory
only if update bit is set Other caches get out of sync
I/O must access main memory through cache 15% of memory references are writes
Number of Caches
On-chip cache (L1) Reduces the processor’s external bus activity Speeds up execution times and increases overall system performance
External cache (L2) If an L2 SRAM cache is used, then frequently the missing information can be
quickly retrieved. The data can be accessed using the fastest type of bus transfer.
Contemporary designs include both L1 and L2 caches The potential savings due to the use of L2 cache depends on the hit rates in both
the L1 and L2 caches
The 80386 does not include an on—chip cache The 80486 includes a single on-chip cache of 8 KBytes The initial Pentium includes 2 on-chip caches (L1)
One for data and one for instructions, each 8 Kbytes, using a line size of 32 bytes
and two-way set associative organization
The Pentium Pro and Pentium II include 2 on-chip caches (L1) (size: 8 – 16 Kbytes) and one off-chip cache (L2) of size from 256 Kbytes up to 1 Mbytes.
The core of the processor includes four main nodes (units):
Pentium Cache Organization
Node of Fetch/Decoding: fetches (according the order)
instructions from the Code Cache (L1), decodes them, forms a sequence of micro-instructions and saves them in the Micro-instructions Buffer.
Micro-instructions Buffer: stores the current sequence of the
micro-instructions prepared for execution.
Node of Distribution/Execution: planes an execution of the
micro-operations with an account of their dependence on data and accessibility of necessary resources (that is why the instructions can be executed in order, which differs from the sequence of their entering into the Micro-instruction Buffer). This node organizes the forecasting execution of micro-operations. After execution of micro-operations the node fetches results from the cache and stores them in the processor’s registers.
Node of Termination: determines when the result of forestall
micro-operations (операции, выполненные с опережением) can be considered as decisive and it must be fixed in Data Cache, it also deletes those instructions from the Buffers which are not necessary at this moment.
System Bus
READ
LOAD STORE
Cache L2
(256K ,1M)
The Interface Node with the Bus
Code Cache
L1
(8, 16K)
Data Cache L1
(8, 16K)
Node of
Fetch/Decoding
Node of
Distribution/
Execution
Node of
Termination
Micro-instructions Buffer
Pentium II processor block diagram
Set (Way) 0 Set (Way) 1
LRU Directory 0 Directory 1 Bank 0 Bank 1 Bank 0 Bank 1
127 element
0 element
32 bytes 32 bytes
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Service Fields
.
.
.
.
4 Kb 4 Kb
Directory
Entry State
Bits
Structure of Pentium II Internal Data Cache
To provide cache consistency the data cache supports a protocol
MESI (modified/exclusive/ shared/invalid). The data cache includes two status bits per tag, so each line can be in one of four states: Modified: the line in the cache has been modified and it differs from
that in the main memory, so it is available only in this cache. Exclusive: The line in the cache is the same as that in the main
memory and also is not present in any other cache. Shared: The line in the cache is the same as that in the main
memory and may be present in another cache. Invalid: the line in the cache does not contain valid data.
Data Cache Consistency
The internal cache is controlled by two bits of the control registers: CD (cache disable) and NW (not write through). There are two Pentium instructions that can be used to control the cache: INVD – Flushes the cache memory and signals the external memory (if any) to flush (принудительно обновлять). WBINVD – performs the same function but also signals an external write-back cache to write modified blocks before flushing.
Cache Control
Table: PowerPC Internal Caches
Model Size Bytes/Line Organization
PowerPC 601
(1) 32—Kbytes 32 8-way set associative
PowerPC 603
(2) 8--Kbytes 32 2-way set associative
PowerPC 604
(2) 16—Kbytes 32 4-way set associative
PowerPC 620
(2) 32--Kbytes 64 8-way set associative
PowerPC Cache Organization
128 bits
64 bits
64 bits
128-bit
L2/Bus
Interface
Instruction Cache
32 KBytes
Instruction
Unit
Integer
ALU
Integer
ALU
Integer
ALU
Integer
Registers
Load/Store
Unit
Floating-
Point
Registers
Floating-
Point
ALU
Data Cache
32 KBytes
PowerPC 620 (G3) block diagram
Questions to Lecture № 7
1. What is the main purpose of Cache Memory implementation?
2. Describe principles of Cache Memory work.
3. Enumerate elements of Cache Design.
4. Draw up a block-diagram of Pentium processor and explain
functions of its main nodes.
5. How is ensured the Data Cache Consistency?
Lecture № 8
External Memory
1. Types of external memory. Data organization and formatting.
2. RAID (Six levels of RAID).
3. Optical memory.
Literature.
1. Stallings W. Computer Organization and Architecture. Designing and performance, 5th ed. – Upper Saddle River, NJ : Prentice Hall, 2002.
2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL INTERNATIONAL EDITIONS, 1996.
Tanenbaum, A.S. Structured Computer Organization, 4th ed. - Upper Saddle River, NJ : Prentice Hall, 2002
Types of External Memory
Magnetic Disk
RAID
Removable
Optical
CD-ROM
CD-Writable (WORM)
CD-R/W
DVD
Magnetic Tape Magnetic Disk
Metal or plastic disk coated with magnetizable material (iron oxide … rust)
Range of packaging
Floppy
Winchester hard disk
Removable hard disk
Data Organization and Formatting.
Concentric rings or tracks (between 500 and 2000 tracks on one side)
Tracks divided into sectors (data are read by blocks = sectors; there can be between 10 and 100 sectors)
Adjacent tracks are separated by gaps. This prevents errors due to misalignment (несоосность) of the head (conducting coil).
To simplify the electronics, the same number of bits are typically stored on each track.
Gap 1
ID Field
0
Gap 2
Data Field
0
Gap 3
Gap 1
ID Field
1
Gap 2
Data Field
1
Gap 3
Gap 1
ID Field 29
Gap 2
Data Field 29
Gap 3
Bytes 17 7 41 515 20 17 7 41 515 20 17 7 41 515 20
600 Bytes/Sector
Index
Sector Physical Sector 0 Physical Sector 1 Physical Sector 29
Winchester Disk Track Format
Disk Data Layout
Inter—record Gap
Tracks
Sectors
Inter-track Gaps
Identifies the
start of the ID
Synch Byte
Data
CRC
Synch Byte
Track #
Head #
Sector #
CRC
Data are transferred to and from the disk in blocks, accordingly data are stored in block-size regions known as sectors. To avoid imposing unreasonable precision requirements the adjacent sectors are separated by intra—track(record) gaps. In order to identify positions within a track there are must be starting points on the tracks and ways for identification the start and the end of each sector. These requirements are handled by means of control data recorded on the disk. Thus, the disk is formatted with some extra data used only by the disk driver, and they are not accessible to the user. In Fig. Winchester Disk Track Format each track contains 30 fixed-length sectors of 600 bytes each. Every sector holds 512 bytes data, plus control information useful to the disk controller. The ID field is a unique identifier or address used to locate a particular sector. The SYNCH byte is special bit pattern that determines the beginning of the field. The track number identifies a head, since this disk has multiple surfaces. The ID and data fields contain an error-detecting code (CRC).
Characteristics of Disk Systems
Characteristic Set of Parameters/Possible meanings
Head Motion Fixed head (one per track) Movable head (one per surface)
Disk Portability Non-removable disk Removable disk
Sides Single-sided Double-sided
Platters Single-platter Multiple-platter
Head Mechanism Contact (floppy) Fixed gap
Aerodynamic gap (Winchester)
Bytes 1 2 1 1 2
Bytes 1 512 2
Holds the control
sum of the field
Disk Access Time
Disk Access Time is the main Characteristic of Disk Performance. If removable heads are used and disk drive is operating, then to read/write, the head must be positioned at the desired track and at the beginning of the desired sector on that track. The time it takes to position the head at the track is known as seek time. In either case, once the track is selected, the system waits until the appropriate sector rotates to line up with the head. The time it takes for the sector to reach the head is known as rotational latency.
RAID (Six[seven] levels of RAID).
With the use of multiple disks, there is a wide variety of ways in which the data can be organised and in which redundancy can be added to improve reliability. Industry has agreed on a standardised scheme for multiple-disk database design, known as RAID (Redundant Array of Independent Disks) The RAID scheme consists of six levels. These levels do not imply a hierarchical relationship but designate different design architectures that share three common characteristics:
RAID is a set of physical disk drives (набор приводов магнитных дисков) viewed by operating system as a single logical drive.
Data is distributed across the physical drives of an array.
Redundant disk capacity is used to store parity information (контрольная информация), which guarantees data recoverability in case of a disk failure.
RAID systems of different levels differ by methods of realisation the second and the third characteristics.
Disk Access Time is the main Characteristic of Disk Performance.
Disk Access Time is equal to the sum of Seek time and
Rotational Latency time
RAID, Level 0
No redundancy (is not a true member of the RAID family)
Data is striped across all disks
Round Robin stripe organization (циклическая ленточная организация)
All the user and system data is viewed as being stored on a logical disk; the disk is divided into strips, these strips may be physical blocks, sectors or some other units. The strips are mapped round—robin to consecutive array members. A set of logically consecutive strips that maps exactly one strip to each array member is referred to as a stripe. In an n-disk array the first n logical strips are physically stored as the first strip on each of the n disks, the second n strips are distributed as the second strips on each physical disk, and so on.
Strip 9 Strip 8 8
Strip 7 7
Strip 6 66
Strip 5 5
Strip 4 4
Strip 3 33
Strip 2 2
Strip 1 11
Strip 0 0
Strip 14 14
Strip 10 10
Strip 6 6
Strip 2 2
Strip 15 15
Strip 11 11
Strip 7 7
Strip 3 3
Strip 13 13
Strip 9 9
Strip 5 5
Strip 1 1
Strip 12 12
Strip 8 8
Strip 4 4
Strip 0 0
Logical Physical Physical Physical Physical
Disk Disk 0 Disk 1 Disk 2 Disk 3
Array
Management
Software
Data Mapping for a RAID Level 0 Array
Strip 15
Strip 11
Strip 7
Strip 3 Strip
15
Strip 11
Strip 7
Strip 3 Strip
15
Strip 11
Strip 7
Strip 3 Strip
15
Strip 11
Strip 7
Strip 3
RAID 1 Mirrored
Mirrored Disks
Data is striped across disks
2 copies of each stripe on separate disks
Read from either
Write to both
Recovery is simple Swap faulty disk & re-mirror No down time
Expensive
RAID 2 Redundancy Through Hamming Code
Disks are synchronized
Very small stripes
Error correction calculated across corresponding bits on disks
Multiple parity disks store Hamming code error correction in corresponding positions.
Lots of redundancy
Strip 12 15
Strip 8 11
Strip 4 7
Strip 0 3
Strip 13 15
Strip 9 11
Strip 5 7
Strip 1 3
Strip 14 15
Strip 10 11
Strip 6 7
Strip 2 3
Strip 15 15
Strip 11 11
Strip 7 7
Strip 3 3
Strip 12 15
Strip 8 11
Strip 4 7
Strip 0 3
Strip 13 15
Strip 9 11
Strip 5 7
Strip 1 3
Strip 14 15
Strip 10 11
Strip 6 7
Strip 2 3
Strip 15 15
Strip 11 11
Strip 7 7
Strip 3 3
Expensive Not used
RAID 3
Bit-Interleaved Parity
Similar to RAID 2
Only one redundant disk, no matter how large the array
Simple parity bit for each set of corresponding bits
Data on failed drive can be reconstructed from surviving data and parity info
Very high transfer rates
RAID 4 Block-Level Parity
Each disk operates independently
Good for high I/O request rate
Large stripes
b0
b1
b2
b3
f0(b)
f1(b)
f2(b)
b0
b1
b2
b3
P(b)
Bit by bit parity calculated across stripes on each disk
Parity stored on each disk
RAID 5 Block-Level Distributed Parity
Like RAID 4
Parity striped across all disks
Round robin allocation for parity stripe
Avoid RAID 4 botl-neck at parity disk
Commonly used in network servers
RAID 6 Redundancy Through 2 Different Codes
The scheme of functioning suggests calculation of 2 control codes stored in different blocks distributed through all disks.
Control codes P and Q are calculated by different algorithms, it allows to restore lost data when even two disks have been failed.
Strip 12 15
Strip 8 11
Strip 4 7
Strip 0 3
Strip 13 15
Strip 9 11
Strip 5 7
Strip 1 3
Strip 14 15
Strip 10 11
Strip 6 7
Strip 2 3
Strip 15 15
Strip 11 11
Strip 7 7
Strip 3 3
P(12-15)
P(8-11)
P(4-7)
P(0-3)
P(16-19) Strip 12 15
Strip 8 11
Strip 4 7
Strip 0 3
Strip 16 P(12-15)
Strip 9 11
Strip 5 7
Strip 1 3
Strip 17 Strip 13 15
P(8-11)
Strip 6 7
Strip 2 3
Strip 18 Strip 14 15
Strip 10 11
P(4-7)
Strip 3 3
Strip 19 Strip 15 15
Strip 11 11
Strip 7 7
P(0-3)
The hardware is more complicated. Strip 12
15
Strip 8 11
Strip 4 7
Strip 0 3
P(12-15) Strip 9 11
Strip 5 7
Strip 1 3
Q(12-15) P(8-11) Strip 6 7
Strip 2 3 Strip 15
15
Q(8-11) P(4-7)
Strip 3 3 Strip 12
15
Strip 8 11
Q(4-7) P(0-3)
Strip 13 15
Strip 9 11
Strip 5 7
Q(0-3)
Optical Memory Optical Disk Products:
CD
A nonerasable disk that stores digitized audio information. The standard system uses 12-cm disks and can record more than 60 minutes of uninterrupted
playing time.
CD-ROM
A nonerasable disk used for storing computer data. The standard system uses 12-cm disks and can hold more than 550 Mbytes.
DVD
Digital video disk. The technology of video-signals recording and other data of a large volume is used and it’s based on methods of information (data)
compression.
WORM
Write-Once Read Many is more easily written than CD-ROM, making single-copy disks commercially feasible; holds from 200 to 800 Mbytes of data.
Erasable Optical Disk
A disk that uses optical technology but that can be easily erased and rewritten. A typical capacity is 650 Mbytes.
Both the audio and the CD-ROM share similar technology. The main difference is in the formats of data presentation.
Optical Storage CD-ROM
Originally for audio
650 (775)Mbytes giving over 70(73.2) minutes audio
Poly-carbonate coated with highly reflective coat (aluminum)
Data stored as pits
Reads by reflecting laser
Constant packing density
Constant linear velocity (1.2 m/s)
CD-ROM block Format 12 bytes 4 bytes 2046 bytes 288 bytes Sync Id
2352 bytes
Mode 0 = blank data field
Mode 1 = 2048 bytes data+ error correction
00
FF x 10
00
Min
Sec
Sector
Mode
Data
Layered
ECC
Mode 2 = 2336 bytes data CD-ROM block Format consists of the following fields: 1. Sync: identifies the beginning of a block; 2. Header: contains the block address and the mode byte; 3. Data: User’s data; 4. Auxiliary: additional user’s data in mode 2. In mode 1, this is 288-bytes error-correcting code.
Random Access on CD-ROM
Difficult
Move head to rough position
Set correct speed
Read address
Adjust to required location
Other Optical Storage
CD-Writable WORM Now affordable Compatible with CD—ROM drives
CD-RW Erasable Getting cheaper Mostly CD-ROM drive compatible
DVD Storage
Digital Video Disk Used to indicate a player for movies
Only plays video disks
Digital Versatile Disk Used to indicate a computer drive
Will read computer disks and play video disks
DVD technology
Multi-layer
Very high capacity (4.7 G per layer)
Full length movie on single disk
Using MPEG compression
Finally standardised (honest!)
Movies carry regional coding
Players only play correct region films
Can be “fixed”
DVD-Writable
Loads of trouble with standards
First generation DVD drives may not read first generation DVD—W disks
First generation DVD drives may not read CD-RW disks
Magnetic Tape
Serial access
Slow
Very cheap
Backup and archive
Questions to Lecture 8
1. Why RAID 0 can not be considered as a true member of RAID family?
Compare RAID 5 and RAID 6 (illustrate the answer by pictures).
2. List the well-known Optical Disk products and describe their characteristics.
3. Give an example of CD-ROM block formats.
4. List the major characteristics of Disk System.
5. How is evaluated the Disk Access Time? What does the Disk Access Time
characterize? What is RAID? List three common characteristics of RAID.
6. Describe the typical Disk data layout (draw a picture).
7. How are sector positions within a track identified? Give an example of disk
track format (describe the meaning of each field).
Lecture № 9 Virtual Memory
1. Virtual Memory Techniques.
2. Virtual Memory Address translation.
3. Use of an associative-mapped TLB.
Literature.
1. Stallings W. Computer Organization and Architecture. Designing and performance, 5
th ed. – Upper Saddle River, NJ :
Prentice Hall, 2002.
2. V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky. Computer organization,4th ed. – McGRAW-HILL
INTERNATIONAL EDITIONS, 1996.
Tanenbaum, A.S. Structured Computer Organization, 4th
ed. - Upper Saddle River, NJ : Prentice Hall, 2002
Virtual-memory Technique Usually only some parts of a program that are executed are first brought into the main memory; when a new part (segment) of a program is to be moved into the full memory, it must replace another segment already in the memory. In modern computers, the operating system moves programs and data automatically between the main memory and secondary storage. Techniques that automatically move program and data blocks into a physical main memory when they are required for execution
are called virtual memory techniques. The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage and is usually implemented in part by software techniques. Programs, and hence the processor, reference an instruction and data space, that is independent of the available physical main memory space. The binary addresses that the processor issues for either instructions or
data are called virtual or logical addresses.
These addresses are translated into physical addresses by a combination of hardware and software components. If a virtual address refers to a part of the program or data space that is currently in the physical memory, then the contents of appropriate location in the memory are accessed immediately. On the other hand, if the referenced address is not in the main memory, its contents must to be brought into a suitable location in the main memory before they can be used.
The figure Virtual Memory Organization shows a typical
organization that implements virtual memory. A special hardware unit,
called Memory Management Unit (MMU), translates virtual
address into physical address. When the desired data are in the main memory, these data are fetched as of the cache mechanism. If the data are not in the main memory, the MMU causes the operating system to bring the data into the memory from the disk. Transfer of data between the disk and the main memory is performed using the DMA.
Virtual Memory Organization
Processor
Cache
Main Memory
Disk Storage
MMU
Virtual address Physical address Physical address Physical address
Data DMA transfer
DMA transfer
Virtual Memory Address Translation
Page table base register
Page Table
Page table address Virtual page number Offset
.
.
.
.
.
Page frame Offset
+
Virtual address from processor
Control Page frame
Bits in memory
Physical address in main memory
Address Translation. A simple method for translating virtual address into physical addresses is to assume that all programs and data are composed of fixed length units called pages. Each page consists of a block of words that occupy contiguous locations in the main memory. Pages commonly range from 2K to 16K bytes in length. They constitute the basic unit of information that is moved between the main memory and the disk whenever the translation mechanism determines that a move is required. Pages should not to be too small, because the access time of a magnetic disks is much longer than of the main memory. The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage and is usually implemented in part by software techniques. A virtual memory address translation method based on the concept of fixed length pages: each virtual address generated by the processor, whether it is for an instruction fetch or an operand fetch/store operation, is interpreted as virtual page number (high—order bits) followed by an offset(low-order bits) that specifies the location of a particular byte (or word) within a page. Information about the main memory location of each page is kept in a page table. This information includes the main memory address where the page is
stored and the current status of the page. An area in the main memory that can hold one page is called a page frame. The starting address of the page table is kept in a page table base register. By adding the virtual page number to the contents of this register, the address of the corresponding entry in the page table is obtained. The contents of this location give the starting address of the page if that page currently resides in the main memory. Each entry in the page table also includes some control bits that describe the status of the page while it is in the main memory. One bit indicates the validity of the page, that is, whether the page is actually loaded in the main memory. This bit allows the operating system to invalidate the page without actually removing it. Another bit indicates whether the page has been modified during its residency in the memory. Other control bits indicate various restrictions that may be imposed on accessing the page. For example, a program may be given full read and write permission, or it may be restricted to read access only The page table information is used by MMU for every read and write access. An access to every word, set by the virtual address demands two operations with the main memory. Thus, a straightforward virtual memory scheme would have the effect of doubling the memory access time. To overcome this
problem, most virtual memory schemes make use of a special cache for page table entries. The page table is kept in the main memory, however, a copy of a small portion of the page table can be accommodated within the MMU. This portion consists of the page table entries that correspond to the most recently accessed pages. A small cache, usually called the Translation Lookaside Buffer (TLB) (буфер быстрой переадресации), is incorporated into the MMU for this purpose. In addition to the information that constitutes a page table entry, the TLB includes the virtual address of the entry. Address translation proceeds as follows. Given a virtual address, the MMU looks in the TLB for the referenced page. If the page table entry for this page is found in the TLB, the physical address is obtained immediately. If there is a miss in the TLB, then the required entry is obtained from the page table in the main memory, and the TLB is updated.
Virtual address from processor TLB
Virtual page number
Control bits
Page frame In memory
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Physical address in Main memory
No
Use of Associative-mapped TLB
Virtual page number Offset
Page frame Offset
=?
Hit Miss
Questions to Lecture 9
1. Give a definition of Virtual Memory Techniques. Draw up a
scheme of Virtual Memory Organization and explain roles of
the MMU in this scheme.
2. Describe the process of Virtual Memory Address Translation
(draw up a scheme of this process).
Glossary on the course Computer Organization and Architecture
Address Translation 19 Architecture of the Computer System (CS) 2 Bus 5, 6, 7, 8, 9 Bus Structure 5, 6 Cache 13, 14, 15 Clock cycle 8 CMOS 12 Computer 2, 3 Computer Performance 4 Central Processing Unit (CPU) 5 Data 2 Disk Access Time 17 EEPROM 10 EPROM 10 External Memory 16, 17 Flash Memory 10 Format 2 Function 4 Hardware 3 Information 2, 3 Interface 3 I/O Module 5 Memory 5, 10, 11 Memory Hierarchy 10 Mezzanine Architecture 8 Organization of the CS 4 PCI Bus 9 PROM 10 Protocols 3,4 RAM 10,11 Redundant Array of Independent Disks 17,18 Replacement Algorithms 13,14 ROM 12 Semiconductor Memory 10, 11, 12, 13 Software 3 Structure 4 TLB 19, 20 Virtual-memory Technique 18
2
Data are base elements of information, such as numbers, letters,
symbols and so on, which are processed or carried out by human or
computer (or by some machine) [sometimes the information itself,
prepared for certain purposes (in a special form) is considered as
data].
Information is a matter conferred (присваиваемое содержимое)
to the data.
Format is a way of data representation, or a scheme of data
positioning.
Computer is a device or a complex of devices, which is intended
for mechanization or automating of data processing, and which is
constructed on the base of electronic elements (transistors, logic
circuits, magnet elements and so on).
[Analog Computer is a computing device, which processes data
given in a form of continuously changing physical values, the
meanings of which may be measured (such values may be angle or
linear transfers, electric voltage, electric current power, time and so
on). These analog values are processed by mechanical or some
other physical methods, by measuring results of such operations.
Such type of computers are usually used for solving equations,
describing processes in real scale of time, when initial data is input
from special measuring data monitors.]
[Digital Computer is an electronic computing device, which
receives a discrete input data, processes it in accordance with the
list of instructions stored inside it and generates resulting output
data. (Instructions may be considered as a special type of data,
which are coded in correspondence with format; these instructions:
a) manage data transfer as inside the computer itself, so the
computer internal and peripheral devices (input-output devices), b)
determine arithmetic or logic operations to be performed).]
3
[Hybrid Computer is a computing system, in which elements of
analog and digital computers are combined. These computers are
used for solving equations by implementing analog devices, but for
storage, future processing and results representation digital devices
are implemented.]
Composition of Computer is called configuration.
Hardware consists of tangible (palpable) objects: integrated
circuits, printed boards, cables, memory devices, printers, some
others technical devices and physical equipment.
Software is a detailed instructions that control the operation of a
computer system.
Interface is:
(1) a relation between two processing components.
(2) a complete complex of agreements (a language in a
common sense) concerning input and output signals, by
which may exchange the following data processors:
computer device – computer device; program – program
medium; human beings – data processing system, - and
some others. These agreements are called protocols. Protocols
are sequence of technical requirements, which must be provided
by constructors of any device for successful concordance
(compatibility) of its (the considered device) work with other
devices.
Definition Architecture of the Computer System (CS) is a
specification of its interfaces, which determines data
processing and includes: methods of data coding, system of
instructions, principles of software-hardware interaction. It
is also determined as a set of information, which is
necessary and sufficient for programming in the machinery
code.
4
Definition The operational units and their interconnections
that realize the architecture of the CS is the Organization of
the CS. All Intel x86 family share the same basic architecture
The IBM System/370 family share the same basic architecture
This gives code compatibility, software succession Organization differs between different versions Architecture is more conservative than organization
Structure is the way of merging (uniting) components of some subsystem in one (whole) unit.
Function is an operation of individual component as a part of the structure.
Definition. The Computer Performance (CP) is determined by
number of certain (well known) operations per time unity.
The generalized estimation of the CP is a number of
transactions per second.
The basic performance characteristics of a computer
system: processor speed, memory capacity, interconnection
data rates.
The instruction fetch consists of reading an instruction from a
location in the memory. The instruction execution may involve several operations and depends
on the nature of the instruction.
Address Space(AS) is a set of addresses, which the
microprocessor is able to generate.
The way of connecting the various modules is called the
interconnection structure. The interconnection structure is determined by character of exchange
operations, which are specific for each module.
Major forms of input and output for the modules:
Memory: Typically, a memory module will consists of N words of equal
length. Each word is assigned a unique numerical address (0, 1, …, N-1).
5
A word of data can be read from or written into the memory. The nature
of the operations is indicated by READ or WRITE control signals. The
location for the operation is specified by an address.
I/O Module: It’s functionally similar to the memory (from internal point
of view). There are two operations READ and WRITE. Further, an I/O
module may control more than one external device. We can refer to each
of the interfaces to an external device as a port and give each a unique
address (e.g., 0, 1, 2.,…, M-1). In addition, there are external data paths
for the input and output of data with an external device. Finally, an I/O
module may be able to send interrupt signals to the CPU.
CPU: CPU reads in instructions and data, writes out data after processing,
and uses control signals to control the overall operation of the system. It
also receives interrupt signals.
Types of transfers supported by interconnection structure.
Memory to CPU: The CPU reads an instruction or unit of data from
memory.
CPU to Memory: The CPU writes a unit of data to memory.
I/O to CPU: The CPU reads data from I/O device via an I/O module.
CPU to I/O: The CPU sends data to the I/O device.
I/O to or from the Memory: For these two cases, an I/O module is
allowed to exchange data directly with memory, without going through
the CPU, using direct memory access (DMA).
Multiplexer is a functional device which permits to two or more
channels of data link to use the same common device of data
transfer jointly.
A bus is a set of electric pathways and service
electronic devices (framing), providing exchange
of data among computer units and devices.
A communication pathway connecting two or more devices is a bus.
Bus Structure
A system bus consists, typically, of from 50 to 100 separate lines, which can
be classified into three functional groups: data, address and control lines
(power lines are usually omitted ).
6
Data Bus (Line)
The data lines provide a path for moving data between system modules.
Number of lines is referred as WIDTH of the data bus (the number of lines
determines how many bits can be transferred at a time)
Address Bus (Line)
Identify the source or destination of data
(e.g. CPU needs to read an instruction (data) from a given
location in memory)
Address Bus width determines maximum memory capacity of the
system.
Used to address as the Main Memory, so I/O ports (the higher-order
bits are used to select a particular module on the bus, and the lower-order bits
select an address in the Memory or I/O port within the module).
E.g., if a width of a bus is equal to 8, then codes 01111111 and less
specify cells addresses in the Main Memory module (module with 0
address), and codes from 10000000 and higher specify I/O ports which
are under control of a module with an address 1.
Control Bus(Line)
Is used to control the access to and the use of the data and
address lines.
Control and timing information(indicate validity of data and address information)
Memory read/write signal
Interrupt request
Typical control lines include:
Memory Write: Causes data on the bus to be written
into the addressed location.
Memory Read: Causes data from the addressed
location to be placed on the bus.
7
I/O Write: Causes data on the bus to be output to
the addressed I/O port.
I/O Read: Causes data from the addressed I/O port
to be placed on the bus.
Transfer ACK: Indicates that data have been
accepted from or placed on the bus.
Bus Request: Indicates that a module needs to gain
control of the bus.
Bus Grant: Indicates that a requesting module has
been granted control of the bus.
Interrupt request: Indicates that interrupt is
pending.
Interrupt ACK: Acknowledges that the pending
interrupt has been recognized.
Clock: Used to synchronize operations.
Reset: Initializes all modules.
The operation of any bus is as follows:
If one of the modules “wishes” to send data to
another, it must do two things:
1. Obtain the use of the bus;
2. Transfer data through the bus.
If one of the modules “wishes” to receive data from
the other module it must do:
1. Obtain the use of the bus;
2. Send request to the other module, by putting the
corresponding code on the address lines after
formation signals on the certain control lines.
8
Computer systems contain a number of different
buses that provide pathways between components at
various levels of the computer systems hierarchy.
A bus that connects major computer components (CPU,
Memory, I/O) is called a System Bus.
Up to now the Traditional Bus Architecture has been widely used. In this
case the Computer System includes Local Bus, which connects the CPU,
Cache Memory and some peripheral devices. Cache Memory Controller
provides connections not only with the Local Bus, but with the System Bus
as well (all modules of the Main Memory are connected with the System
Bus). Under such structure all processes of input-output are realized through
the System Bus omitting the CPU, it allows the CPU to perform more
important operations.
The connecting peripheral devices not directly to the System Bus, but to
additional bus - Expansion Bus, which buffers data circulating between the
Main Memory and peripheral devices’ controllers allows to support a large
variety of external devices, and at the same time to separate information-
flows “CPU – Memory” and “ Memory – I/O Controllers”.
The appearance of new high-performance external devices demands to
increase speed of data transfer through buses, that is why one more High-
Speed Bus is often used in contemporary computer systems. This bus unites
high-speed external devices and is connected with the System Bus through
special concordance module (модуль согласования) - Bridge. Such kind of
structure is called Mezzanine Architecture (Мезонинная Архитектура).
The advantage of this structure: high-speed peripheral devices are
integrated with the processor and at the same time they may work
independently (themselves). It means that functioning of the bus doesn’t
depend on the CPU architecture and vice versa.
The bus includes a clock line upon which a clock transmits a regular
sequence of alternating 1s and 0s of equal duration. A single 1-0
transmission is referred to as a clock cycle (bus cycle) and defines a time
slot (интервал). All other devices on the bus can read the clock line, and all
events start at the beginning of a clock cycle. Other bus signals may change
at the leading edge of the clock signal.
With asynchronous timing the occurrence of one event on a bus
follows and depends on the occurrence of a previous event.
9
In actual implementations, electronic switches are used. The output gate of
register is capable of being electrically disconnected from the bus or placing
a 0 or a 1 on the bus. Because it supports these three possibilities, such a gate
is said to have a three—state output. A separate control input is used either
to enable the gate output to drive the bus to 0 or to 1 or to put it in a high-
impedance (electrically disconnected) state. The latter state corresponds to
the open-circuit state of a mechanical switch.
PCI Bus
Peripheral Component Interconnection, high-bandwidth, processor-independent, functions as a mezzanine or peripheral bus
Intel released to public domain
32 or 64 bit, 33 (66)MHz, a transfer rate 264(528) Mbytes/sec
50 lines
PCI Bus Lines (required)
1. Systems lines
Including clock and reset 2. Address & Data
32 time lines for address/data
Interrupt & validate lines 3. Interface Control Control the timing transactions and provide coordination among
initiators and targets 4. Arbitration
Not shared
Direct connection to PCI bus arbiter 5. Error lines
PCI Bus Lines (Optional)
Interrupt lines
Not shared
10
Cache support
64-bit Bus Extension
Additional 32 lines
Time multiplexed
2 lines to enable devices to agree to use 64-bit transfer
JTAG/Boundary Scan (For testing procedures)
Memory Hierarchy
Registers
In CPU
Internal or Main memory
May include one or more levels of cache
―RAM‖
External memory
Backing store
Semiconductor Memory RAM, ROM, PROM, EPROM,
Flash Memory, EEPROM, CMOS Cycle times of semiconductor memories range from a few hundred
nanoseconds to less than 10 nanoseconds.
Memory unit is called RAM if any location can be accessed for Read or
Write operation in some fixed amount of time that is independent of the
location’s address
RAM
Read/Write at an arbitrary address (at random)
11
Volatile
Temporary storage
Static or dynamic
Dynamic RAM
Bits stored as a charge in capacitors
Charges leak
Need refreshing even when powered
Simpler construction
Smaller per bit
Less expensive
Need refresh circuits
Slower
Main memory
Static RAM
Bits stored as on/off switches (using traditional flip-flop logic gate configurations)
No charges to leak
No refreshing needed when powered
More complex construction
Larger per bit
More expensive
Does not need refresh circuits
12
Faster
Cache
Read Only Memory (ROM)
Permanent storage
Micro-programming
Library subroutines
Systems programs (BIOS)
Function tables
CMOS – Complementary Metal-Oxide Semiconductor. CMOS is
intended for storing the Computer current configuration. It stores data
practically without using energy.
Types of ROM
Written during manufacture
ROM, very expensive for small runs
Programmable (once)
PROM
Needs special equipment to program (programmer)
Read ―mostly‖
Erasable Programmable (EPROM)
Erased by UV (all the storage)
Electrically Erasable (EEPROM)
Takes much longer to write than read
Flash memory (intermediate between EPROM and EEPROM) Erase memory electrically
Cache
Cache memory is intended to give memory speed approaching that of the
fastest memories available, and at the same time provide this fast memory
the price of less expensive types of semiconductor memories.
13
Small amount of fast memory
Sits between normal main memory and CPU (off-chip cache)
May be located on CPU chip or module (on-chip cache)
Cache Design
Size (more optimal size: between 1K and 512K)
Mapping Function (direct, associative, set associative)
Replacement Algorithm (LRU, FIFO, LFU, Random)
Write Policy(Information integrity)(Write through, Write back)
Block Size (no definitive optimum value has been found)
Number of Caches (Single- or two-level, Unified or Split)
The Cache Efficiency is characterized by hit
ratio. The hit ratio is a ratio of all hits in the
cache to the number of CPU’s accesses to the
memory. Replacement Algorithms (1)
Direct mapping
No choice
Each block only maps to one line
Replace that line
Replacement Algorithms (2) Associative & Set Associative
Hardware implemented algorithm (speed)
Least Recently used (LRU)
14
e.g. in 2 way set associative
Which of the 2 block is LRU?
First in first out (FIFO)
replace block that has been in cache longest
Least frequently used
replace block which has had fewest hits
Random
Write Policy
Must not overwrite a cache block unless main memory is up to date
Multiple CPUs may have individual caches
I/O may address main memory directly
Write through
All writes go to main memory as well as cache
Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date
Lots of traffic
Slows down writes
Write back
Updates initially made in cache only
Update bit for cache slot is set when update occurs
15
If block is to be replaced, write to main memory only if update bit is set
Other caches get out of sync
I/O must access main memory through cache
15% of memory references are writes
Number of Caches
On-chip cache (L1) Reduces the processor’s external bus activity
Speeds up execution times and increases overall system performance
External cache (L2) If an L2 SRAM cache is used, then frequently the missing
information can be quickly retrieved. The data can be accessed using the fastest type of bus transfer.
Contemporary designs include both L1 and L2 caches The potential savings due to the use of L2 cache depends on the
hit rates in both the L1 and L2 caches
Data Cache Consistency To provide cache consistency the data cache supports a protocol MESI (modified/exclusive/ shared/invalid). The data cache includes two status bits per tag, so each line can be in one of four states: Modified: the line in the cache has been modified and it differs
from that in the main memory, so it is available only in this cache. Exclusive: The line in the cache is the same as that in the main
memory and also is not present in any other cache. Shared: The line in the cache is the same as that in the main
memory and may be present in another cache. Invalid: the line in the cache does not contain valid data.
Cache Control The internal cache is controlled by two bits of the control registers: CD (cache disable) and NW (not write through).
16
There are two Pentium instructions that can be used to control the cache: INVD – Flushes the cache memory and signals the external memory (if any) to flush (принудительно обновлять). WBINVD – performs the same function but also signals an external write-back cache to write modified blocks before flushing.
Types of External Memory
Magnetic Disk
RAID
Removable
Optical
CD-ROM
CD-Writable (WORM)
CD-R/W
DVD Magnetic Tape
Magnetic Disk
Metal or plastic disk coated with magnetizable material (iron oxide … rust)
Range of packaging
Floppy
Winchester hard disk
Removable hard disk
Winchester Disk Track Format each track contains 30 fixed-
length sectors of 600 bytes each. Every sector holds 512 bytes data, plus control information useful to the disk controller.
17
The ID field is a unique identifier or address used to locate a particular sector. The SYNCH byte is special bit pattern that determines the beginning of the field. The track number identifies a head, since this disk has multiple surfaces. The ID and data fields contain an error-detecting code (CRC).
Characteristics of Disk Systems
Characteristic Set of Parameters/Possible
meanings
Head Motion Fixed head (one per track) Movable head (one per surface)
Disk Portability Non-removable disk Removable disk
Sides Single-sided Double-sided
Platters Single-platter Multiple-platter
Head Mechanism Contact (floppy) Fixed gap
Aerodynamic gap (Winchester)
Disk Access Time Disk Access Time is the main Characteristic of Disk Performance. If removable heads are used and disk drive is operating, then to read/write, the head must be positioned at the desired track and at the beginning of the desired sector on that track. The time it takes to position the head at the track is known as seek time. In either case, once the track is selected, the system waits until the appropriate sector rotates to line up with the head. The time it takes for the sector to reach the head is known as rotational latency. Disk Access Time is equal to the sum of Seek time and Rotational Latency time
RAID (Six[seven] levels of RAID). With the use of multiple disks, there is a wide variety of ways in which the data can be organised and in which redundancy can be
18
added to improve reliability. Industry has agreed on a standardised scheme for multiple-disk database design, known as RAID (Redundant Array of Independent Disks) The RAID scheme consists of six levels. These levels do not imply a
hierarchical relationship but designate different design architectures
that share three common characteristics:
RAID is a set of physical disk drives (набор приводов
магнитных дисков) viewed by operating system as a single logical
drive.
Data is distributed across the physical drives of an array.
Redundant disk capacity is used to store parity information
(контрольная информация), which guarantees data recoverability in
case of a disk failure.
RAID systems of different levels differ by methods of realisation the second
and the third characteristics.
Virtual-memory Technique
Usually only some parts of a program that are executed are first brought into the main memory; when a new part (segment) of a program is to be moved into the full memory, it must replace another segment already in the memory. In modern computers, the operating system moves programs and data automatically between the main memory and secondary storage. Techniques that automatically move program and data blocks into a physical main memory when they are required for execution are called virtual memory techniques. The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage and is usually implemented in part by software techniques. Programs, and hence the processor, reference an instruction and data space, that is independent of the available physical main memory space. The binary addresses that the processor issues for either instructions or data are called virtual or logical addresses. A special hardware unit, called Memory Management Unit (MMU),
translates virtual address into physical address. When the desired data
are in the main memory, these data are fetched as of the cache
mechanism. If the data are not in the main memory, the MMU causes
the operating system to bring the data into the memory from the disk.
19
Transfer of data between the disk and the main memory is performed
using the DMA.
Address Translation. A simple method for translating virtual address into physical addresses is to assume that all programs and data are composed of fixed length units called pages. Each page consists of a block of words that occupy contiguous locations in the main memory. Pages commonly range from 2K to 16K bytes in length. They constitute the basic unit of information that is moved between the main memory and the disk whenever the translation mechanism determines that a move is required. Pages should not to be too small, because the access time of a magnetic disks is much longer than of the main memory. The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage and is usually implemented in part by software techniques. A virtual memory address translation method based on the concept of fixed length pages: each virtual address generated by the processor, whether it is for an instruction fetch or an operand fetch/store operation, is interpreted as virtual page number (high—order bits) followed by an offset(low-order bits) that specifies the location of a particular byte (or word) within a page. Information about the main memory location of each page is kept in a page table. This information includes the main memory address where the page is stored and the current status of the page. An area in the main memory that can hold one page is called a page frame. The starting address of the page table is kept in a page table base register. By adding the virtual page number to the contents of this register, the address of the corresponding entry in the page table is obtained. The contents of this location give the starting address of the page if that page currently resides in the main memory. The page table information is used by MMU for every read and write
access. An access to every word, set by the virtual address demands two
operations with the main memory. Thus, a straightforward virtual memory
scheme would have the effect of doubling the memory access time. To
overcome this problem, most virtual memory schemes make use of a special
cache for page table entries. The page table is kept in the main memory,
however, a copy of a small portion of the page table can be accommodated
within the MMU. This portion consists of the page table entries that
correspond to the most recently accessed pages. A small cache, usually
called the Translation Lookaside Buffer (TLB) (буфер быстрой
20
переадресации), is incorporated into the MMU for this purpose. In addition
to the information that constitutes a page table entry, the TLB includes the
virtual address of the entry.
Address translation proceeds as follows. Given a virtual address, the
MMU looks in the TLB for the referenced page. If the page table entry for
this page is found in the TLB, the physical address is obtained immediately.
If there is a miss in the TLB, then the required entry is obtained from the
page table in the main memory, and the TLB is updated.
Questions
1. In your own words explain the following notions(concepts) and give
examples:
a) data, information, format;
b) computer (analog, digital, hybrid);
c) hardware, software, computer configuration;
d) function, structure, interface;
e) architecture, organization.
2. List the major components of contemporary computer system and
indicate there functions.
3. List operations, which you more often use, when you work with a
computer and explain, which of the computer’s major components are
engaged in a process of executing one of these operations.
4. Analyze the 5 given below definitions of Computer architecture.
Which of these definitions does more than others correspond to the
officially accepted one? (Give a detailed explanation).
1)”The design of the integrated system which provides a useful tool
to the programmer” (Bear)
2)”The study of structure, behavior and design of computers”
(Hayes)
3)”The design of the system specification at a general or subsystem
level”(Abd-Alla)
4)”The art of designing a machine that will be pleasure to work
with”(Foster)
5)”The interface between the hardware and the lowest level
software”(Hennessy and Patterson).
5. Call the minimal number of levels of virtual machine, which can
execute all main computer functions (give explanation).
6. What is the difference between translator and interpreter?
7. Why computer hardware and computer software are considered as
logically equivalent?
8. Describe Architecture and Structure Organization of computers of I,
II, III and IV generations, compare them.
9. Formulate and analyze Key Concepts of von Neumann Architecture.
10. Describe the functional structure of von Neumann machine.
11. Describe the functional structure of IAS. List elements of Architecture
and Structure Organization (details) of IAS.
12. List and describe base electronic components of contemporary
computer.
13. Formulate and analyze Moor’s Law.
14. What’s Computer System Performance? List the basic characteristics
of Computer System Performance.
15. What’s Hardwired Program? (What’s programming in Hardware?)
16. What’s Software Program? (What’s programming in Software?)
17. Describe the functional structure of Computer components (Top level
View) in the eye of Interconnection Subsystem.
18. What’s the Main Cycle of Instruction Processing (MCIP)?
19. Describe the architecture of “Hypothetical Machine”. What is the
difference between translator and interpreter?
20. Describe each step of MCIP on the “Hypothetical Machine” for one
concrete instruction.
21. Describe each step of MCIP on the IAS for one concrete instruction.
22. What do we mean under the Interrupts? What is the main reason of
using the Interrupt Mechanism?
23. Draw up diagrams of the Program Flow Control without interrupts
and with interrupts, describe each fragment of the Program Flow Control.
24 Which classes of interrupts must be enabled constantly? (give
explanation) .
25. Describe the mechanism of work with interrupts.
26. In the diagram “Program Flow Control” find points, which
correspond to interrupts of user’s program and explain the necessity of
using these interrupts.
27. How many techniques of I/O operations execution are used? Describe
each of these techniques and compare them.
28. Describe the Direct Memory Access technique.
29. Draw a scheme of DMA Transfer in Computer System.
30. Which approaches can be taken to dealing with multiple interrupts?
Show advantages and disadvantages of these approaches.
31. What is the interconnection structure, and by which factors is it
determined?
32. List the types of exchanges (input and output) that are
characteristically for each module, draw up a sketch for the CPU
module (indicate the major forms of input and output) and explain
from which modules the CPU receives data (What kind of operations
are specific for the CPU module?).
33. What kind of buses does the System Bus include? What function does
each of these buses carry out?
34. What do we call the width of a bus? Which parameters of the
Computer System are determined by widths of some buses included in
the System Bus?
35. What operation does the control signal “I/O read” set?
36. What problems may arise, when only one (single) bus is used in a
computer system?
37. Give examples of using multiple bus structures in computer systems
and explain necessity of including each of the buses in the system.
38. List and describe main generic types of buses.
39 . Which methods of arbitration are used now? What’s the difference
between these methods?
40. Describe existed methods of access to different types of memory.
41. Which parameters are used for the estimation memory devices
performance? What does each of these parameters characterize?
42. Explain the necessity of a memory hierarchy employment.
43. What is RAM? Describe distinguishing characteristics of RAM.
What’s the difference between DRAM and SRAM?
44. What is ROM?
45. Explain the necessity of implementation of EPROM (EEROM,
Flash Memory).
46. What is the main purpose of Cache Memory implementation?
47. Describe principles of Cache Memory work.
48. Enumerate elements of Cache Design.
49. Draw up a block-diagram of Pentium processor and explain
functions of its main nodes.
50. How is ensured the Data Cache Consistency?
52. Why RAID 0 can not be considered as a true member of RAID
family? Compare RAID 5 and RAID 6 (illustrate the answer by
pictures).
53. List the well-known Optical Disk products and describe their
characteristics.
54. Give an example of CD-ROM block formats.
55. List the major characteristics of Disk System.
56. How is evaluated the Disk Access Time? What does the Disk
Access Time characterize? What is RAID? List three common
characteristics of RAID.
57. Describe the typical Disk data layout (draw a picture).
58. How are sector positions within a track identified? Give an
example of disk track format (describe the meaning of each field).
59. Give a definition of Virtual Memory Techniques. Draw up a
scheme of Virtual Memory Organization and explain roles of the
MMU in this scheme.
60. Describe the process of Virtual Memory Address Translation (draw
up a scheme of this process).
Questions
1. In your own words explain the following notions(concepts) and give
examples:
a) data, information, format;
b) computer (analog, digital, hybrid);
c) hardware, software, computer configuration;
d) function, structure, interface;
e) architecture, organization.
2. List the major components of contemporary computer system and
indicate there functions.
3. List operations, which you more often use, when you work with a
computer and explain, which of the computer’s major components are
engaged in a process of executing one of these operations.
4. Analyze the 5 given below definitions of Computer architecture.
Which of these definitions does more than others correspond to the
officially accepted one? (Give a detailed explanation).
1)”The design of the integrated system which provides a useful tool
to the programmer” (Bear)
2)”The study of structure, behavior and design of computers”
(Hayes)
3)”The design of the system specification at a general or subsystem
level”(Abd-Alla)
4)”The art of designing a machine that will be pleasure to work
with”(Foster)
5)”The interface between the hardware and the lowest level
software”(Hennessy and Patterson).
5. Call the minimal number of levels of virtual machine, which can
execute all main computer functions (give explanation).
6. What is the difference between translator and interpreter?
7. Why computer hardware and computer software are considered as
logically equivalent?
8. Describe Architecture and Structure Organization of computers of I,
II, III and IV generations, compare them.
9. Formulate and analyze Key Concepts of von Neumann Architecture.
10. Describe the functional structure of von Neumann machine.
11. Describe the functional structure of IAS. List elements of Architecture
and Structure Organization (details) of IAS.
12. List and describe base electronic components of contemporary
computer.
13. Formulate and analyze Moor’s Law.
14. What’s Computer System Performance? List the basic characteristics
of Computer System Performance.
15. What’s Hardwired Program? (What’s programming in Hardware?)
16. What’s Software Program? (What’s programming in Software?)
17. Describe the functional structure of Computer components (Top level
View) in the eye of Interconnection Subsystem.
18. What’s the Main Cycle of Instruction Processing (MCIP)?
19. Describe the architecture of “Hypothetical Machine”. What is the
difference between translator and interpreter?
20. Describe each step of MCIP on the “Hypothetical Machine” for one
concrete instruction.
21. Describe each step of MCIP on the IAS for one concrete instruction.
22. What do we mean under the Interrupts? What is the main reason of
using the Interrupt Mechanism?
23. Draw up diagrams of the Program Flow Control without interrupts
and with interrupts, describe each fragment of the Program Flow Control.
24 Which classes of interrupts must be enabled constantly? (give
explanation) .
25. Describe the mechanism of work with interrupts.
26. In the diagram “Program Flow Control” find points, which
correspond to interrupts of user’s program and explain the necessity of
using these interrupts.
27. How many techniques of I/O operations execution are used? Describe
each of these techniques and compare them.
28. Describe the Direct Memory Access technique.
29. Draw a scheme of DMA Transfer in Computer System.
30. Which approaches can be taken to dealing with multiple interrupts?
Show advantages and disadvantages of these approaches.
31. What is the interconnection structure, and by which factors is it
determined?
32. List the types of exchanges (input and output) that are
characteristically for each module, draw up a sketch for the CPU
module (indicate the major forms of input and output) and explain
from which modules the CPU receives data (What kind of operations
are specific for the CPU module?).
33. What kind of buses does the System Bus include? What function does
each of these buses carry out?
34. What do we call the width of a bus? Which parameters of the
Computer System are determined by widths of some buses included in
the System Bus?
35. What operation does the control signal “I/O read” set?
36. What problems may arise, when only one (single) bus is used in a
computer system?
37. Give examples of using multiple bus structures in computer systems
and explain necessity of including each of the buses in the system.
38. List and describe main generic types of buses.
39 . Which methods of arbitration are used now? What’s the difference
between these methods?
40. Describe existed methods of access to different types of memory.
41. Which parameters are used for the estimation memory devices
performance? What does each of these parameters characterize?
42. Explain the necessity of a memory hierarchy employment.
43. What is RAM? Describe distinguishing characteristics of RAM.
What’s the difference between DRAM and SRAM?
44. What is ROM?
45. Explain the necessity of implementation of EPROM (EEROM,
Flash Memory).
46. What is the main purpose of Cache Memory implementation?
47. Describe principles of Cache Memory work.
48. Enumerate elements of Cache Design.
49. Draw up a block-diagram of Pentium processor and explain
functions of its main nodes.
50. How is ensured the Data Cache Consistency?
52. Why RAID 0 can not be considered as a true member of RAID
family? Compare RAID 5 and RAID 6 (illustrate the answer by
pictures).
53. List the well-known Optical Disk products and describe their
characteristics.
54. Give an example of CD-ROM block formats.
55. List the major characteristics of Disk System.
56. How is evaluated the Disk Access Time? What does the Disk
Access Time characterize? What is RAID? List three common
characteristics of RAID.
57. Describe the typical Disk data layout (draw a picture).
58. How are sector positions within a track identified? Give an
example of disk track format (describe the meaning of each field).
59. Give a definition of Virtual Memory Techniques. Draw up a
scheme of Virtual Memory Organization and explain roles of the
MMU in this scheme.
60. Describe the process of Virtual Memory Address Translation (draw
up a scheme of this process).