70
The Embedded Hardware Architecture @ ISA MODELS

isa architecture

Embed Size (px)

Citation preview

Page 1: isa architecture

The Embedded Hardware Architecture@ ISA MODELS

Page 2: isa architecture
Page 3: isa architecture

Embedded Hardware: Building Blocks and the Embedded Board

In This Chapter

• Introducing the importance of being able to read a schematic diagram

• Discussing the major components of an embedded board

• Introducing the factors that allow an embedded device to work

• Discussing the fundamental elements of electronic components

Page 4: isa architecture

Learn to Read a Schematic

Blueprint Reading

Page 5: isa architecture
Page 6: isa architecture

ALPHABET OF LINES

•Universal language for designers, engineers, & production personnel.

•Uses lines, numbers, symbols and illustrations.

Different Blueprint Forms:•Drawings for fabrication (Standardized symbols for mechanical, welding,construction, electrical wiring and assembly).

•Sketches (Illustrate an idea, technical principle or function).

Lines are made in definite standard forms: (all have specific meaning)

• Thickness of a line (thick or thin)

• Solid

• Broken

• Dashed

Page 7: isa architecture

A. ALPHABET OF LINES

Page 8: isa architecture

4-8

Von Neumann ModelM E M O R Y

C O N T R O L U N I T

M A R M D R

I R

P R O C E S S I N G U N I T

A L U T E M P

P C

O U T P U TM o n i t o rP r i n t e rL E DD i s k

I N P U TK e y b o a r dM o u s eS c a n n e rD i s k

Page 9: isa architecture

Discussion of the internal processor design as related to the von Neumann model

Page 10: isa architecture

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

4-10

Memory2k x m array of stored bits

Address• unique (k-bit) identifier of location

Contents• m-bit value stored in location

Basic Operations:

LOAD• read a value from a memory location

STORE• write a value to a memory location

•••

0000000100100011010001010110

110111101111

00101101

10100010

Page 11: isa architecture

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

4-11

Interface to MemoryHow does processing unit get data to/from memory?

MAR: Memory Address Register

MDR: Memory Data Register

To LOAD a location (A):1. Write the address (A) into the MAR.

2. Send a “read” signal to the memory.

3. Read the data from MDR.

To STORE a value (X) to a location (A):1. Write the data (X) to the MDR.

2. Write the address (A) into the MAR.

3. Send a “write” signal to the memory.

M E M O R Y

M A R M D R

Page 12: isa architecture

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

4-12

Processing UnitFunctional Units

• ALU = Arithmetic and Logic Unit• could have many functional units.

some of them special-purpose(multiply, square root, …)

• LC-3 performs ADD, AND, NOT

Registers• Small, temporary storage• Operands and results of functional units• LC-3 has eight registers (R0, …, R7), each 16 bits wide

Word Size• number of bits normally processed by ALU in one instruction• also width of registers• LC-3 is 16 bits

P R O C E S S I N G U N I T

A L U T E M P

Page 13: isa architecture

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

4-13

Input and OutputDevices for getting data into and out of computer memory

Each device has its own interface,usually a set of registers like thememory’s MAR and MDR

• LC-3 supports keyboard (input) and monitor (output)• keyboard: data register (KBDR) and status register (KBSR)• monitor: data register (DDR) and status register (DSR)

Some devices provide both input and output• disk, network

Program that controls access to a device is usually called a driver.

I N P U TK e y b o a r dM o u s eS c a n n e rD i s k

O U T P U TM o n i t o rP r i n t e rL E DD i s k

Page 14: isa architecture

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

4-14

Control UnitOrchestrates execution of the program

Instruction Register (IR) contains the current instruction.

Program Counter (PC) contains the addressof the next instruction to be executed.

Control unit:• reads an instruction from memory

the instruction’s address is in the PC

• interprets the instruction, generating signals that tell the other components what to do

an instruction may take many machine cycles to complete

C O N T R O L U N I T

I RP C

Page 15: isa architecture

internal processor design as related to the von Neumann model

• Processors are the main functional units of an embedded board, and are primarily responsible for processing instructions and data.

• An electronic device contains at least one master processor, acting as the central controlling device, and can have additional slave processors that work with and are controlled by the master processor.

• These slave processors may either extend the instruction set of the master processor or act to manage memory, buses, and I/O (input/output) devices.

Page 16: isa architecture

Powering the Hardware

• Once you’ve soldered in the components needed for the power supply, power up the board and check that this is operational. Also check that you have power on every pad on the board where you expect power to be, and check the ground pads to make sure that there is no power where you expect no power to be.

• Next, solder in the power-decoupling capacitors for the ICs. Add in the processor’s oscillator and decoupling capacitors. If the oscillator is a module, check its operation with an oscilloscope.

• If IC sockets are used, solder these next, then insert the components. If you’re using processors that need to be externally reprogrammed, then sockets are a good idea.

Page 17: isa architecture
Page 18: isa architecture
Page 19: isa architecture

Variations chart

Page 20: isa architecture

Computer Architecture’s Changing Definition• 1950s to 1960s:

Computer Architecture Course = Computer Arithmetic• 1970s to mid 1980s:

Computer Architecture Course = Instruction Set Design, especially ISA appropriate for compilers

• 1990s: Computer Architecture Course = Design of CPU, memory system, I/O system, Multiprocessors

Page 21: isa architecture

Review

• Amdahl’s Law:

• CPU Time & CPI:

Execution Time without enhancement 1Speedup(E) = --------------------------------------------------------- = ---------------------- Execution Time with enhancement (1 - F) + F/S

CPU time = Instruction count x CPI x clock cycle timeCPU time = Instruction count x CPI / clock rate

Page 22: isa architecture

Instruction Set Architecture (ISA)

instruction set

software

hardware

Page 23: isa architecture

ISA LevelThe ISA level is the interface between the compilers and the hardware.

Page 24: isa architecture

Overview of the Pentium 4 ISA Level

The Pentium 4’s primary registers.

Page 25: isa architecture

Instruction Set Architecture (ISA)( Serves as an interface between software and hardware )

• Provides a mechanism by which the software tells the hardware what should be done.

instruction set

High level language code : C, C++, Java, Fortran,

hardware

Assembly language code: architecture specific statements

Machine language code: architecture specific bit patterns

software

compiler

assembler

Page 26: isa architecture

Instruction Set Architecture

• Instruction set architecture is the structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for that machine.

• The instruction set architecture is also the machine description that a hardware designer must understand to design a correct implementation of the computer.

Page 27: isa architecture

Interface Design

Properties of A good interface:• Lasts through many implementations (portability, compatibility)

• Is used in many different ways (generality)

• Provides convenient functionality to higher levels

• Permits an efficient implementation at lower levels

Interfaceimp 1

imp 2

imp 3

use

use

use

time

Page 28: isa architecture

Evolution of Instruction SetsSingle Accumulator (EDSAC 1950)

Accumulator + Index Registers(Manchester Mark I, IBM 700 series 1953)

Separation of Programming Model from Implementation

High-level Language Based Concept of a Family(B5000 1963) (IBM 360 1964)

General Purpose Register Machines

Complex Instruction Sets Load/Store Architecture

RISC

(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)

(Mips,Sparc,HP-PA,IBM RS6000,PowerPC . . .1987)

LIW/”EPIC”? (IA-64. . .1999)

Page 29: isa architecture

Evolution of Instruction Sets

• Major advances in computer architecture are typically associated with landmark instruction set designs

• Design decisions must take into account:1. technology2. machine organization3. programming languages4. compiler technology5. operating systems

• And they in turn influence these

Page 30: isa architecture

What Are the Components of an ISA?

• Sometimes known as The Programmer’s Model of the machine

• Storage cells– General and special purpose registers in the CPU– Many general purpose cells of same size in memory– Storage associated with I/O devices

• The machine instruction set– The instruction set is the entire repertoire of machine operations– Makes use of storage cells, formats, and results of the fetch/execute cycle– i.e., register transfers

Page 31: isa architecture

1. The instruction formatSize and meaning of fields within the instruction

1. The nature of the fetch-execute cycle– Things that are done before the operation code is known

What Are the Components of an ISA?

Page 32: isa architecture

• Which operation to perform add r0, r1, r3–Ans: Op code: add, load, branch, etc.

• Where to find the operands: add r0, r1, r3 – In CPU registers, memory cells, I/O locations, or part of instruction

• Place to store result add r0, r1, r3–Again CPU register or memory cell

What Must an Instruction Specify?(I)Data Flow

Page 33: isa architecture

• Location of next instruction add r0, r1, r3 br endloop– Almost always memory cell pointed to by program counter—PC

(Sometimes there is no operand, or no result, or no next instruction. )

What Must an Instruction Specify?(II)

Page 34: isa architecture

ISACSCE430/830

Types of Operations

• Arithmetic and Logic: AND, ADD

• Data Transfer: MOVE, LOAD, STORE

• Control BRANCH, JUMP, CALL

• System OS CALL, VM

• Floating Point ADDF, MULF, DIVF

• Decimal ADDD, CONVERT

• String MOVE, COMPARE

• Graphics (DE)COMPRESS

Page 35: isa architecture

Instructions Can Be Divided into 3 Classes (I)

• Data movement instructions– Move data from a memory location or register to another memory location or register without

changing its form– Load—source is memory and destination is register– Store—source is register and destination is memory

• Arithmetic and logic (ALU) instructions– Change the form of one or more operands to produce a result stored in another location– Add, Sub, Shift, etc.

• Branch instructions (control flow instructions)– Alter the normal flow of control from executing the next instruction in sequence– Br Loc, Brz Loc2,—unconditional or conditional branches

Page 36: isa architecture

Classifying ISAsAccumulator (before 1960):

1 address add A acc <− acc + mem[A]

Stack (1960s to 1970s):0 address add tos <− tos + next

Memory-Memory (1970s to 1980s):2 address add A, B mem[A] <− mem[A] + mem[B]3 address add A, B, C mem[A] <− mem[B] + mem[C]

Register-Memory (1970s to present): 2 address add R1, A R1 <− R1 + mem[A]

load R1, A R1 <_ mem[A]

Register-Register (Load/Store) (1960s to present):3 address add R1, R2, R3 R1 <− R2 + R3

load R1, R2 R1 <− mem[R2]store R1, R2 mem[R1] <− R2

Page 37: isa architecture

Classifying ISAs

Page 38: isa architecture

Code Sequence C = A + B Code Sequence C = A + B for Four Instruction Setsfor Four Instruction Sets

Stack Accumulator Register(register-memory)

Register (load-store)

Push APush BAddPop C

Load AAdd BStore C

Load R1, AAdd R1, BStore C, R1

Load R1,ALoad R2, BAdd R3, R1, R2Store C, R3

memory memoryacc = acc + mem[C] R1 = R1 + mem[C] R3 = R1 + R2

Page 39: isa architecture

Stack Architectures• Instruction set:

add, sub, mult, div, . . .push A, pop A

• Example: A*B - (A+C*B)push Apush Bmulpush Apush Cpush Bmuladdsub

A BA

A*BA*B

A*BA*B

AAC

A*BA A*B

A C B B*C A+B*C result

Page 40: isa architecture

Stacks: Pros and Cons• Pros

– Good code density (implicit operand addressing top of stack)– Low hardware requirements– Easy to write a simpler compiler for stack architectures

• Cons– Stack becomes the bottleneck– Little ability for parallelism or pipelining– Data is not always at the top of stack when need, so additional instructions like TOP and SWAP

are needed– Difficult to write an optimizing compiler for stack architectures

Page 41: isa architecture

Accumulator Architectures• Instruction set:

add A, sub A, mult A, div A, . . .

load A, store A

• Example: A*B - (A+C*B)

load B

mul C

add A

store D

load A

mul B

sub D

B B*C A+B*C AA+B*C A*B result

Page 42: isa architecture

Accumulators: Pros and Cons

• Pros

– Very low hardware requirements

– Easy to design and understand

• Cons– Accumulator becomes the bottleneck– Little ability for parallelism or pipelining

– High memory traffic

Page 43: isa architecture

Register -Memory Architectures

• Instruction set:

(3 operands) add A, B, C sub A, B, C mul A, B, C

• Example: A*B - (A+C*B)

– 3 operands

mul D, A, B

mul E, C, B

add E, A, E

sub E, D, E

Page 44: isa architecture

Memory-Memory:Pros and Cons

• Pros

– Requires fewer instructions (especially if 3 operands)

– Easy to write compilers for (especially if 3 operands)

• Cons

– Very high memory traffic (especially if 3 operands)

– Variable number of clocks per instruction (especially if 2 operands)

– With two operands, more data movements are required

Page 45: isa architecture

Register-Memory Architectures• Instruction set:

add R1, A sub R1, A mul R1, B

load R1, A store R1, A

• Example: A*B - (A+C*B)

load R1, A

mul R1, B /* A*B */

store R1, D

load R2, C

mul R2, B /* C*B */

add R2, A /* A + CB */

sub R2, D /* AB - (A + C*B) */

Page 46: isa architecture

Register- Memory -: Pros and Cons

• Pros

– Some data can be accessed without loading first

– Instruction format easy to encode– Good code density

• Cons

– Operands are not equivalent (poor orthogonality)

–Variable number of clocks per instruction– May limit number of registers

Page 47: isa architecture

Load-Store Architectures• Instruction set:

add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3

load R1, R4 store R1, R4• Example: A*B - (A+C*B)

load R1, &Aload R2, &Bload R3, &Cload R4, R1load R5, R2load R6, R3mul R7, R6, R5 /* C*B */add R8, R7, R4 /* A + C*B */mul R9, R4, R5 /* A*B */sub R10, R9, R8 /* A*B - (A+C*B) */

Page 48: isa architecture

Load-Store: Pros and Cons

• Pros

– Simple, fixed length instruction encoding– Instructions take similar number of cycles

– Relatively easy to pipeline

• Cons

–Higher instruction count – Not all instructions need three operands

– Dependent on good compiler

Page 49: isa architecture

FLPU = Floating Points operations UnitPFCU = Prefetch control unitAOU = Atomic Operations Unit Memory-Management unit (MMU)MAR (memory address register)MDR (memory data register)BIU (Bus Interface Unit)ARS (Application Register Set)FRS File Register Set (SRS) single register set

Page 50: isa architecture

Processor Performance

Performance A measure of how fast something works..

Page 51: isa architecture

Amdahl’s Law

Single EnhancementF: Fraction enhanced, S: Speedup enhanced

F/S

Affected

SF

FSpeedup

+−=

)1(

1

1 - F FExecution Time (without E)

1 - F

Unaffected

Execution Time (with E)

Page 52: isa architecture

Ex: Amdahl’s Law (I) Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

What is the Speedup?

Page 53: isa architecture

Ex: Amdahl’s Law (I)

F = 0.1, S = 2

053.195.0

1

21.0

)1.01(

1 ==+−

=Speedup

Make Common Case Fast Enhance the parts of the program that are used most often,so ‘execution time affected by improvement’ is as large as possible.

Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

What is the Speedup?

Page 54: isa architecture

Amdahl’s Law (II)

Multiple Enhancements F1,F2,F3: Fraction enhanced, S1,S2,S3: Speedup enhanced

∑∑==

+−= n

i i

in

ii S

FF

Speedup

11

)1(

1

1 – (F1+F2+F3)

Unaffected

Execution Time (with E)

1 – (F1+F2+F3) F1Execution Time (without E)

F2 F3

Affected

Fi/Si

Page 55: isa architecture

Ex: Amdahl’s Law (II)Three CPU performance enhancements with the following speedup Enhancements and percentage of the execution time:

1) Percentage F1: 20%, Enhanced Speedup S1: 102) Percentage F2: 15%, Enhanced Speedup S2: 153) Percentage F3: 10%, Enhanced Speedup S3: 30

Assumption: Each enhancement affects a different portion of the codeand only one enhancement can be used at a time.

What is the Total Speedup?

71.10333.055.0

1

)1(

13

1

3

1

=+

=+−

=∑∑== i i

i

ii S

FF

Speedup

Page 56: isa architecture

Execution Time

CPU Time

- doesn’t count I/O or time spent running other programs.

- system CPU time spent in the operating system

- user CPU time spent in the program

Our Focus

user CPU time (CPU) Execution Time = IC * CPI * cycle time

Elapsed Time

- Counts everything (disk and memory access, I/O, etc) - A useful number, but often not good for comparison purposes

Page 57: isa architecture

Ex: CPU Execution time

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

A program is running on a RISC machine with the followings:- 40,000,000 instructions- 6 cycles/instruction- 1 GHz Clock rate

What is the CPU execution time for this program?

CPU Exec. Time = IC * CPI * Clock cycle time = 9101640000000 −×××

= ?? seconds

Page 58: isa architecture

Ex: PerformanceA program is running on a RISC machine with the followings:- 20,000,000 instructions- 5 cycles/instruction- 1 GHz Clock rateUsing the same program with a new compiler:-5,000,000 instructions-2 cycles/instruction-1 GHz Clock rate What is the speedup with the changes?

Speedup = old execution time / new execution time = X / Y = Z (times faster after change)

Page 59: isa architecture

Ex: Instruction Classes & CPI

Compute the CPU clock cycles and average CPI for the following program:

Inst. type ICi CPIi

ALU 20 4

Data transfer 20 5

Control 10 3

(Sol) CPU clock cycles = 20*4 + 20*5 + 10*3 = X Average CPI = X/50 = Y

Page 60: isa architecture

Ex: CPI and Instruction FREQi

Compute the average (effective) CPI for the followings:

Inst. type CPIi FREQi

ALU 3 40% (0.4)

Data transfer 4 40% (0.4)

Control 2 20% (0.2)

(Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = XX

Page 61: isa architecture

Ex: Peak CPI

Compute the Peak CPI for the followings:

Inst. type CPIi FREQi

ALU 3 40% 0%

Data transfer 4 40% 0%

Control 2 20% 100%

(Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0

Page 62: isa architecture

Ex: Average CPI and Average MIPS

Compute the average (effective) CPI for the followings:

Inst. type CPIi FREQi

ALU 3 40% (0.4)

Data transfer 4 40% (0.4)

Control 2 20% (0.2)

(Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = 3.2

If the processor is Pentium II (320MHz), what is the MIPS rate?

100102.3

10320

10 6

6

6=

××=

×=CPI

ClockRateMIPS

Page 63: isa architecture

Ex: Peak CPI and Peak MIPS

Compute the Peak CPI for the followings:

Inst. type CPIi FREQi

ALU 3 40% 0%

Data transfer 4 40% 0%

Control 2 20% 100%

(Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0

If the processor is Pentium II (320MHz), what is the peak MIPS rate?

160102

10320

10 6

6

6=

××=

×=CPI

ClockRateMIPS

Page 64: isa architecture

Benchmarking

Page 65: isa architecture

Multicore Benchmarking Rules

• Do not rely on a single answer• Match your application requirements– Small or large data sets– Few or many threads– Dependencies– OS overhead

Page 66: isa architecture

Two Processor System Utilizing Single Memory Controller

QuadCore

Processor 1

QuadCore

Processor 2

DDR2Interface

Processors 1 and 2 must always arbitrate for memory via their front side bus connection through the North Bridge.

NorthBridge

Intel Front Side Bus

Page 67: isa architecture

Two Processor System Utilizing Dual Memory Controllers

QuadCore

Processor 1

QuadCore

Processor 2

LinkDDR2Interface

DDR2Interface

Direct AccessShared Access

Doubly Shared Access

Page 68: isa architecture

Two Processor System Utilizing Single Memory Controller

QuadCore

Processor 1

QuadCore

Processor 2

LinkDDR2

Interface

• Processor 1 must always access memory by traversing link to Processor 2• Requires arbitration to access Processor 2’s memory

• Processor 2 always has prioritized access to this memory since it is directly attached.• Affinity can help performance

Page 69: isa architecture

Benchmarking Multicore – What’s Important?

• Measuring scalability• Memory and I/O bandwidth• Inter-core communications• OS scheduling support• Efficiency of synchronization• System-level functionality

Page 70: isa architecture

The Multicore benchmarking Roadmap

1.Communications- MCAPI: ultra-light weight

2.Resource Management - Memory management- Basic synchronization - Resource registration- Resource partitioning

3.Task Management-Task scheduling

The Four MC Pillars

Virtualization (or OS)

Communication ResourceManagement

TaskManagement

Debug

Multicore SystemAdopted stdsAdopted stds

MCA Foundation APIsMCA Foundation APIs

Value Added Functions• Languages

• Programming Models• Design Environments

• Application Generators• Benchmarks

Services•Load Balancing

•System Mgt.•Power Mgt.•Reliability

•Quality of Service

4.Debug